Joyent's open-source Manta storage service relies heavily on a replicated, sharded metadata tier for storing the locations of all objects in the system. In designing this system, we needed a metadata tier that would be highly reliable, horizontally scalable, and high-performing for a very heavy, continuous, mixed read/write workload. And to achieve the availability expected from today's major cloud services, we needed a system that could survive single failures with minimal impact to service. We'll talk about the options we considered, why we chose PostgreSQL, and the open-source system called Manatee that we built to manage a cluster of PostgreSQL databases with the goal of surviving database failure. We'll discuss the many challenges associated with building this system, the CAP tradeoffs we made, a number of interesting bumps in the road we ran into, and our production experience deploying it. Finally, we'll talk about what parts of PostgreSQL made it easy to build this system -- and what changes in PostgreSQL could make Manatee better.
Experience level : Intermediate