Designing The Right Schema To Power Heap by Dan Robinson

Wednesday November 16, 11:20-12:15

Heap's analytics infrastructure is built around PostgreSQL. The most important choice to make when building a system this way is the schema you'll use to represent your data. This foundation will heavily influence your write throughput, what sorts of read queries will be fast, what indexing strategies will be available to you, and what data inconsistencies will be possible. With the wrong choice, you won't be able to leverage PostgreSQL's most powerful features.

This talk will walk through the different schemas we've used to power Heap over the last three years, their relative strengths and weaknesses, and the mistakes we've made.

About the speaker

Dan is CTO at Heap, where he uses PostgreSQL, Kafka, Flink, Redis, and CitusDB to build distributed analytics infrastructure. He works in Scala and Node.js day-to-day, though he's been known to get a little too much satisfaction out of solving problems with PL/pgSQL. Dan earned B.S. degrees in Computer Science and Mathematics from Stanford, where spent most of his time studying machine learning. He likes hiking and building physical things.​

Thursday, October 13, 2016 - 12:15