Four Kitchens
Insights

Enforcing branch commit atomicity (or, why the git staging area is bad)

3 Min. ReadDevelopment

With CVS, one of the only repository-wide atomic operations is tagging a local checkout. And not all that long ago, Subversion introduced mainstream users of free, open-source version control systems to full-scale atomicity. Or, at least the ability to be atomic.

Subversion’s approach to atomicity is rooted in its centralization and hybrid branch/directory model. Because Subversion makes it hard to merge from other repositories, there’s a strong incentive to combine many projects and branches into one repository. Subversion therefore offers directory-level checkouts as well as convenient, repository-wide checkouts for developers working on multiple projects. To fit this project and branching model, Subversion performs most operations at the level of the current directory and below. (The only alternative would be performing the operations checkout-wide, which would cause behavior confusingly dependent on the choice of checkout root.)

Subversion’s model operates atomically if users run the commands from the root of a project or branch, but the this-directory-and-below model encourages bad development behavior. For example, when working on a Drupal project, it’s easy to commit just the changes for one module or theme, which creates a revision in the repository that may never have existed as a working copy and may not work. Administrators can mitigate the problem with repository-side continuous integration (CI), but even CI still doesn’t guarantee true project coherence and atomicity.

Bazaar, on the other hand, performs its operations (at least by default) on the entire branch, encouraging real atomic commits. This default, branch-level behavior tends to annoy developers used to separating a project’s changes into different directories of a Subversion checkout and checking in by directory. This Subversion-based workflow works reasonably well in practice, and for systems like Bazaar and git to enforce high levels of atomicity and remain usable, they must provide convenient tools to separate the changes intended for each commit.

Bazaar and git have different approaches providing such tools. Bazaar has shelve and unshelve. git has the staging area.

The most obvious way they differ is in workflow. Bazaar’s commands are optional and remove changes from the upcoming commit. git’s commands are mandatory and add changes to the upcoming commit. Here, git’s choice seems very sensible. Encouraging manual approval of each change in each commit reduces mistakes. On the other hand, Bazaar provides a convenient uncommit command that allows reversal of erroneous commits (which are often only obvious when seeing the file list as the commit is happening). All considered, I slightly prefer git’s workflow here.

Where git fails is in the theoretical foundations of the staging area. The staging area encourages the same bad behavior as in Subversion, just with more surgical control of what gets committed. Committing in git with only some changes added to the staging area still results in an “atomic” revision that may never have existed as a working copy and may not work.

Along these lines, one of the most atomicity-busting aspects of git’s staging area is that it doesn’t just mark code that needs to go into the next commit; it actually saves the hunk into the staging index. So, a developer could add code to the staging area, modify her working copy, and end up with a commit containing code that’s neither in her working copy nor in her stash. The code only ends up in the commit just made, silently filed away for someone else to get in their next merge:

This command [add] can be performed multiple times before a commit. It only adds the content of the specified file(s) at the time the add command is run; if you want subsequent changes included in the next commit, then you must run git add again to add the new content to the index.

In contrast, shelving a change in Bazaar reverts the change in the working copy. (It does save the change for later restoration with unshelve.) Because shelved changes are not in the working copy, Bazaar encourages the ultimate in atomicity: what a developer commits represents an atomic snapshot of the entire branch as represented by her working copy. And if tests pass when she commits, the same tests will pass if another developer pulls the same revision of the same branch.