The CAP theorem is like physics to airplanes: every database must design around it

Back in 2000, Eric Brewer introduced the CAP theorem, an explanation of inherent tradeoffs in distributed database design. In short: you can’t have it all. (Okay, so there’s some debate about that, but alternative theories generally introduce other caveats.)

On Twitter, I recently critiqued a presentation by Bryan Fink on the Riak system for claiming that Riak is “influenced” by CAP. This sparked a short conversation with Justin Sheehy, also from the project. 140 characters isn’t enough to explain my objection in depth, so I’m taking it here.

While I give Riak credit for having a great architecture and pushing innovation in the NoSQL (non-relational database) space, it can no more claim to be “influenced” by CAP than an airplane design can claim influence from physics. Like physics to an airplane, CAP lays out the rules for distributed databases. With that reality in mind, a distributed database designed without regard for CAP is like an airplane designed without regard for physics. So, claiming unique influence from CAP is tantamount to claiming competing systems have a dangerous disconnect with reality. Or, to carry on the analogy, it’s like Boeing making a claim that their plane designs are uniquely influenced by physics.

But we all know Airbus designs their planes with physics in mind, too, even if they pick different tradeoffs compared to Boeing. And traditional databases were influenced by CAP and its ancestors, like BASE (warning: PDF) and Bayou from Xerox PARC. CAP says “pick two.” And they did: generally C and P. This traditional — and inflexible — design of picking only one point on the CAP triangle for a database system doesn’t indicate lack of influence.

What Riak actually does is quite novel: it allows operation at more than one point on the triangle of CAP tradeoffs. This is valuable because applications value different parts of CAP for different types of data or operations on data.

For example, a banking application may value availability for viewing bank balances. Lots of transactions happen asynchronously in the real world, so a slightly outdated balance is probably better than refusing any access if there’s a net split between data centers.

In contrast, transferring from one account to another of the same person at the same bank (say, checking to savings) generally happens synchronously. A bank would rather enforce consistency above availability. If there’s a net split, they’d rather disable transfers than have one go awry or, worse, invite fraud.

A system like Riak allows making these compromises within a single system. Something like MySQL NDB, which always enforces consistency, would either unnecessarily take down balance viewing during a net split or require use of a second storage system to provide the desired account-viewing functionality.

Commenting on this Blog post is closed.

Comments

David,

I entirely agree with you, and that’s the point I was making at nosqleast when I said “that’s what makes it a theorem.” True things are simply true, and you don’t choose whether or not to have them affect you.

A video of that talk is at https://nosqleast.com/2009/#speaker/sheehy and you can jump 16 minutes in if you don’t want to watch the whole thing.

The CAP theorem doesn’t exist in a vacuum. It came about due to real, inspiring work on serious problems. Eric proposed it due to the changing needs of applications on the internet that needed to make their own business decisions about availability in the face of inevitable host and network failures. The one-size-fits-all model that much of the RDBMS world followed in that regard was just beginning to crumble, and he saw this earlier than most.

I would argue that Riak correctly claims a deeper level of influence here, because it is not just obeying that theorem (as everything does) but is a system based on the idea of explicitly helping the application layer (not just the database layer) be able to make the right tradeoffs.

To carry your analogy forward: everyone is obliged to obey the laws of physics, but some people are also influenced in an additional way by the words and ideas — not just the results — of Newton, Einstein, Feynman, and others.

Others make CAP choices because they must. “Influence” here can mean building a system while being explicitly conscious of the reasons why you must make those choices. Having that influence is not unique to Riak, but it does remain a meaningful distinction.

Best,

-Justin