CouchDB

The CAP theorem is like physics to airplanes: every database must design around it

Back in 2000, Eric Brewer introduced the CAP theorem, an explanation of inherent tradeoffs in distributed database design. In short: you can’t have it all. (Okay, so there’s some debate about that, but alternative theories generally introduce other caveats.)

Giving schema back its good name

For modern applications, the word “schema” has become synonymous with the tables, columns, constraints, indexes, and foreign keys in a relational database management system. A typical relational schema affects physical concerns (like record layout on disk) and logical concerns (like the cascading deletion of records in related tables).

Schemas have gotten a bad name because current RDBMS tools give them these rotten attributes:

Improvements to the Materialized View API

An eye-catching graphic, largely irrelevant to this blog post.An eye-catching graphic, largely irrelevant to this blog post.

The Materialized View API (related posts) provides resources for pre-aggregation and indexing of data for use in complex queries. It does this by managing denormalized tables based on data living elsewhere in the database (and possibly elsewhere). As such, materialized views (MVs) must be populated and updated using large amounts of data. As users change data on the site, MVs must be intelligently updated to avoid complete (read: very slow) rebuilds. Part of performing these intelligent updates is calculating how user changes to data affect MVs in use. Until now, these updates had limitations in scalability and capability.

David's Epic Presentation Megapost

Schema changes should be lazy

We’re in the middle of upgrading Drupal.org, and many of the longest-running upgrades involve schema changes. Unfortunately, MySQL with InnoDB has a very unfriendly method of doing schema changes: it rebuilds the table and blocks all writes until the new table is ready. A more sensible approach would be schema versioning that allows different parts of a table to have different schema versions. This would minimize blocking, allowing schema changes to happen without downtime.