Let constraints drive consistency

TL;DR #

About me #

I've been working with a large production database containing hundreds of millions of rows for more than 3 years now, and the biggest pain point has always been an inconsistency in the data shape:

So how do we prevent this? What should you do if you already have this? How can you gradually enforce data consistency in a large production database? That's what we'll cover in this post.

When adding a new column (when it's not too late): #

When creating a new table: #

When updating an existing table: #

So here's the hardest part: you already have an inconsistent table. What should you do? The good news is you've accepted the inconsistency and want to fight it — that's the first step! The rest is easier:

  1. Identify all inconsistent records
  2. Find out why they're inconsistent and FIX all places (either code or processes) that create them
  3. Fix the inconsistency by backfilling the data
  4. Add a constraint that would prevent this inconsistency from happening again. (See the previous section for ideas)

Since you've made it this far, sharing this article on your favorite social media network would be highly appreciated 💖! For feedback, please ping me on Twitter.

Published