Drupal

Enforcing branch commit atomicity (or, why the git staging area is bad)

With CVS, one of the only repository-wide atomic operations is tagging a local checkout. And not all that long ago, Subversion introduced mainstream users of free, open-source version control systems to full-scale atomicity. Or, at least the ability to be atomic.

Subversion’s approach to atomicity is rooted in its centralization and hybrid branch/directory model. Because Subversion makes it hard to merge from other repositories, there’s a strong incentive to combine many projects and branches into one repository. Subversion therefore offers directory-level checkouts as well as convenient, repository-wide checkouts for developers working on multiple projects. To fit this project and branching model, Subversion performs most operations at the level of the current directory and below. (The only alternative would be performing the operations checkout-wide, which would cause behavior confusingly dependent on the choice of checkout root.)

Subversion’s model operates atomically if users run the commands from the root of a project or branch, but the this-directory-and-below model encourages bad development behavior. For example, when working on a Drupal project, it’s easy to commit just the changes for one module or theme, which creates a revision in the repository that may never have existed as a working copy and may not work. Administrators can mitigate the problem with repository-side continuous integration (CI), but even CI still doesn’t guarantee true project coherence and atomicity.

Bazaar, on the other hand, performs its operations (at least by default) on the entire branch, encouraging real atomic commits. This default, branch-level behavior tends to annoy developers used to separating a project’s changes into different directories of a Subversion checkout and checking in by directory. This Subversion-based workflow works reasonably well in practice, and for systems like Bazaar and git to enforce high levels of atomicity and remain usable, they must provide convenient tools to separate the changes intended for each commit.

Bazaar and git have different approaches providing such tools. Bazaar has shelve and unshelve. git has the staging area.

The most obvious way they differ is in workflow. Bazaar’s commands are optional and remove changes from the upcoming commit. git’s commands are mandatory and add changes to the upcoming commit. Here, git’s choice seems very sensible. Encouraging manual approval of each change in each commit reduces mistakes. On the other hand, Bazaar provides a convenient uncommit command that allows reversal of erroneous commits (which are often only obvious when seeing the file list as the commit is happening). All considered, I slightly prefer git’s workflow here.

Where git fails is in the theoretical foundations of the staging area. The staging area encourages the same bad behavior as in Subversion, just with more surgical control of what gets committed. Committing in git with only some changes added to the staging area still results in an “atomic” revision that may never have existed as a working copy and may not work.

Along these lines, one of the most atomicity-busting aspects of git’s staging area is that it doesn’t just mark code that needs to go into the next commit; it actually saves the hunk into the staging index. So, a developer could add code to the staging area, modify her working copy, and end up with a commit containing code that’s neither in her working copy nor in her stash. The code only ends up in the commit just made, silently filed away for someone else to get in their next merge:

This command [add] can be performed multiple times before a commit. It only adds the content of the specified file(s) at the time the add command is run; if you want subsequent changes included in the next commit, then you must run git add again to add the new content to the index.

In contrast, shelving a change in Bazaar reverts the change in the working copy. (It does save the change for later restoration with unshelve.) Because shelved changes are not in the working copy, Bazaar encourages the ultimate in atomicity: what a developer commits represents an atomic snapshot of the entire branch as represented by her working copy. And if tests pass when she commits, the same tests will pass if another developer pulls the same revision of the same branch.

DrupalCon DC swag is here!

Here’s a taste of what you’ll be getting at DrupalCon DC.

[img_assist|nid=105|title=DrupalCon DC swag|desc=|link=none|align=center|width=600|height=452]

The stickers and button were designed by Four Kitchens. Development Seed very graciously printed and paid for the buttons. Thanks, Development Seed!

The sticker in the lower left is based on a DrupalCon T-shirt design we submitted. (If you like it, vote for it! And our other designs, too…)

The button is a more polished version of a DrupalCon “campaign button” design we released last month under the GPLv3.

[img_assist|nid=106|title=Todd’s bag o’ buttons|desc=|link=none|align=center|width=600|height=400]

Here’s my bag wearing a series of hip buttons and a very classy Oktoberfest scarf. It’s probably the best scarf you will ever see — and you’ll see it at DrupalCon DC!

Creating common branch ancestry is a hard problem

One of the key features of distributed version control systems (DVCS) is support for divergent development (branching) and then merging. Most DVCS tools, including git and Bazaar, include rather elegant support for such workflows by embedding metadata about common ancestry into branches. In this post, I’ll be focusing on Bazaar.

“Common ancestry” means an identical revision shared by two branches. The most recent common ancestor typically indicates the point of branching, unless there has been a more recent merge. A successful merge between two branches establishes a new, more recent common ancestor. Identifying a common ancestor is a required step for performing automatic three-way merges to integrate changes from a foreign, divergent branch.

Typically, the branching metadata allows Bazaar to automatically determine the most recent common ancestor to use as the base revision for three-way merging. The problem comes when you need to merge two branches that do not share any common ancestors. (For the purpose of this post, I am not counting revision zero, the universal common ancestor, as a common ancestor. It is generally useless for merging.)

Creating common ancestry
If you try to merge two branches without common ancestry and without any revision identifiers, Bazaar will complain and do nothing. You can specify a revision range (like -r5..-1, which would be everything from the fifth revision to the latest on the foreign branch) and then apply the merge. Bazaar will then merge the changes in the specified revision range into your local branch and establish the last merged revision as the latest common ancestor, making future merges a breeze. Unfortunately, finishing this initial merge is where dragons lie. But before I can get into the difficulty of creating common ancestry by merging two unrelated branches, I have to briefly discuss how Bazaar handles files.

How Bazaar tracks files
Bazaar maps file paths to globally unique file IDs, and two branches without prior common ancestry will have different file IDs for the same paths, even if the files are really the same. Every time a file is added to a Bazaar branch, it gets a unique ID. As files are moved and renamed, they keep their unique IDs. So, if two people download Drupal, extract it, and independently “bzr init”, “bzr add”, and “bzr commit”, their respective README.txt files (for example) will have different file IDs.

Creating common ancestry (continued)
These globally unique file IDs cause trouble when merging from unrelated branches. When Bazaar merges two branches (related or not), two files with the same path but different file IDs create a conflict even if they contain identical content. Merging two unrelated branches with lots of shared files creates a mess of conflict files and conflict directories, and there is currently no convenient way to resolve these conflicts.

A concrete example with Drupal
A developer downloads Drupal and puts the project under version control in a fresh Bazaar branch. She discovers that Four Kitchens maintains a Bazaar branch of stable Drupal releases and decides she would like to use the Four Kitchens branch to automate installation of minor Drupal updates. She attempts to merge in the revision range from the Four Kitchens branch that would perform the upgrade. Because the file IDs in the Four Kitchens branch differ from hers, every Drupal core file and directory has a conflict despite having identical content for the base merge revision. After a bit of tedious conflict cleanup, she commits the merge and goes back to enjoying Bazaar’s generally elegant architecture.

I thought this blog post was going to give me a solution!
Nope, there’s not one out there yet. I’m currently looking at writing a custom merge handler (subclassed from the standard merge3 in Bazaar) that would intelligently handle merges where file paths do represent the same files, regardless of file IDs. Unfortunately, the file ID/path conflict is low-level in Bazaar and occurs before reaching the most modular part of merge conflict resolution.

Thanks to Robert Collins on the Bazaar project for walking me through the Bazaar internals necessary for me to explain this issue and, hopefully, solve it.

Quick Drupal version control with Bazaar

We all have the Drupal projects we work on that ought to be under some sort of version control system but aren’t. There are many reasons why this might be the case: the small size of the project, the lack of central version-control infrastructure, or the annoyance of “.svn” or “CVS” directories littering your working copy.

With Bazaar, none of these excuses should prevent a project from having solid version control.

Here’s why Bazaar avoids those problems:

  • It requires minimal effort to bring code under control.
  • Bazaar does not require (but still supports) centralized infrastructure.
  • A single .bzr directory at the root of the project contains everything Bazaar needs, minimizing disruption.

Bringing a Drupal project into Bazaar

Note: If you want to automate your Drupal updates, you may want to branch from the Four Kitchens repository instead of initializing your own branch. Branching from the Four Kitchens repository is more time consuming, but it’s worthwhile for large projects.

  1. Install Bazaar.
  2. Go to the root directory of your project.
  3. bzr init
  4. bzr ignore ./files # Drupal 5 only
  5. bzr ignore ./sites/default/files/ # Drupal 6 only
  6. bzr ignore ./sites/default/settings.php
  7. bzr add
  8. bzr commit -m “Initial import”

That’s it! Learn to use Bazaar and enjoy powerful version-control capabilities for even your smallest projects.

Distributed version control provides a streamlined alternative to vendor branches

Anyone who’s worked with a sufficiently large project eventually ends up establishing vendor branches to track and merge upstream releases. Maintaining these branches is time-consuming, redundant work because everyone who needs a vendor branch of a project needs approximately the same thing.

With centralized version control tools, you have two choices:

  • Apply patches integrating the changes from upstream releases. This approach avoids the need to set up vendor branches, but it will not allow three-way merges, increasing the developers time spent resolving conflicts. Patches also cannot represent many types of changes, requiring more manual work to upgrade.
  • Download, extract, and commit each upstream release to your project’s own vendor branch. This is also time consuming and requires regular work checking for upstream releases. It does, however, allow three-way merges and applying all types of changes.

Distributed version control systems, like Bazaar, allow you to branch your project from a remote “vendor” branch, providing all the advantages of a local vendor branch with none of the headaches. Moreover, tools like Bazaar track the last merges from remote vendor branches and automatically apply changes made since your last merge.

For Drupal, Four Kitchens maintains vendor branches of stable Drupal 5 and 6 releases:

  • bzr://vcs.fourkitchens.com/drupal/5
  • bzr://vcs.fourkitchens.com/drupal/6

We also maintain vendor branches for Pressflow:

  • bzr://vcs.fourkitchens.com/pressflow/5
  • bzr://vcs.fourkitchens.com/pressflow/6

Branching from these allows you to upgrade your project with a simple “bzr merge [vendor-branch-url]”. The popular vendor branch URLs are shown above.

Using Bazaar to collaborate with other patch developers

In my earlier post on using Bazaar for Drupal core development, I explained how to use the Four Kitchens Bazaar repository to streamline development of core patches. For patches you’re developing on your own, those instructions work great. For patches involving a team of developers, you’d want to have a shared mainline branch, which is beyond my last post’s scope and this one’s.

But, what about when two to three people are collaborating, you’re using Bazaar, and someone else has posted an updated patch that you’d like to work from?

First, you’ll want to install BzrTools. If you installed pre-packaged Bazaar for Mac OS X, you already have these plugins installed.

These instructions also assume you’re working on a patch for Drupal 7. If not, replace the number “7” below with the appropriate version.

Warning: Continuing will erase your local changes and replace them with the changes in the patch you apply.

  1. Copy the patch URL to your clipboard. Don’t bother downloading; Bazaar’s patch utility will download the patch itself.
  2. Change directories to be somewhere in your development branch for the issue you’re working on.
  3. Run as one command: bzr pull --overwrite --revision=date:[date-of-patch] bzr://vcs.fourkitchens.com/drupal/7
    The --revision part is optional, but specifying a date will help Bazaar apply the patch against the revision the patch was created against (or something very close), which is more likely to work than applying the patch to CVS HEAD directly. Use a date specifier like “yesterday”, “2008-12-30”, or “2008-12-30,13:41:14”. See Bazaar’s complete documentation on revision specifiers.
  4. bzr patch [url-of-patch]
  5. bzr commit -m "Applied patch [url-of-patch]."
  6. bzr merge bzr://vcs.fourkitchens.com/drupal/7
  7. Resolve conflicts, if any.
  8. bzr commit -m "Merged in changes to CVS HEAD since patch."

You’ll have a refreshed working copy with the patch applied against the latest Drupal 7. You can now go back to using the instructions on my earlier post to continue working or roll a fresh patch.

Dynamically attribute content in Drupal using the Author Taxonomy module

[img_assist|nid=95|title=Author Taxonomy settings screen|desc=|link=popup|align=right|width=230|height=300]

Attributing a story, image, or blog post to more than one person can pose a problem on many web platforms. In the print publishing world, it’s simply a matter of adding another name to the byline or tacking “Additional reporting by Sue” to the end of a piece. (Nowhere does the poor designer or typesetter get credit for laying out the page!)

On the web, priorities are flipped. Instead of attributing a piece of online content to its author, most web software attributes content to the user who posted it on the site. Countless sites display ugly bylines like “nyeditor5” or “harry_henderson” instead of a properly formatted name.

And what to do about stories with multiple authors? They’re usually given credit in the first or last line of the content itself — a surefire way to reduce a semantic web evangelist to XML-encased tears.

Enter the Author Taxonomy module. We initially developed this module for That Other Paper, an Austin-area, online magazine we used to publish. To maintain a cohesive voice and layout style, we required all content to be vetted by one of our few web-savvy editors prior to publications. As a result, virtually none of the content on the site was placed there by its original author.

We experimented at first with creating a properly formatted username for each contributor: “Sue Smith,” “Hilarious Jackson,” and so on. This turned out to be a hassle, as users are unaccustomed to Usernames with Caps and Spaces. Even then, we couldn’t properly attribute content to more than one author. It was time for a custom module.

Author Taxonomy combines the power of taxonomies with the real-world demands of professional publications. Features include:

  • New authors can be created on the fly with “Tags” (“Free tagging” in Drupal 5.x) enabled.
  • Multiple authors are displayed using serialized text: “Sue and Jan” or “Sue, Jan, and Bobby.”
  • Names link to their taxonomy term pages, providing readers with a quick shortcut to all posts attributed to a particular author.
  • Site admins have the option of automatically overwriting the “real” node author — that is, the user who posted the node — with an author term if a matching username is found.
  • The date display is fully customizable and can be disabled completely.

If anybody has any suggestions for improving the functionality of Author Taxonomy, please add them to the module’s issue queue. We’d love to hear `em.

Screenshots

[img_assist|nid=96|title=Node editing interface|desc=|link=none|align=center|width=600|height=409]

[img_assist|nid=97|title=How Author Taxonomy renders node output|desc=|link=none|align=center|width=600|height=186]

A year in open source

Mid-October marked my one-year anniversary with Four Kitchens and, consequently, the same anniversary of being an open-source software contributor. Because my career up to that point had been limited to proprietary software development, I had intended to write a one-year retrospective on the experience. But, as often happens, one gets busy, and now October, November, and half of December have passed. So without further ado, my self-indulgent retrospective on moving to open source development (complete with pontifications).

Working for “The Man”

I have no idea how many lines of code I’ve written in my career. A lot? Maybe. I can’t actually prove it since all the evidence is locked away. And, to make it worse, there’s not even a product on a store shelf I can point to and say, “Hey! I wrote some of the code in there!”

That was always the hardest part of the job: working my ass off and having nothing to show for it but a paycheck. For some people, the pay is all the satisfaction required. For others, the activity itself provides the satisfaction. For me the satisfaction comes from the end result: being able to feel good about that result and being able to show it to other people. I rarely found that satisfaction while I was doing closed-source work.

Closed-source companies usually have no qualms using open-source software. Contributing back is usually a different story. Often it’s limited to reporting bugs. Even at commercial software companies that participate in open source, employees often need permission to contribute (Big Blue, I’m calling you out). This is true even at some universities since, in the United States, universities can own software patents. Yes, I could have still done it undercover, but I like to keep the agreements I make.

Rose tint my world

So what attracted me to doing open-source work in the first place? The biggest appeal for me was being able to show my work. That sounds really selfish on the surface, but the reality is that unless you are getting what you need out of your work, you’re not actually doing anybody any good. The next appeal was from the non-profits and grass-roots organizations I often saw making use of open-source projects. I saw it as an opportunity to contribute in some way to the things I cared about and could believe in. At least, that was the romantic view.

But there was also a cynical view. Several years of reading comment threads on Slashdot had convinced me that the open-source community was mean-spirited and hostile. Browsing support forums and seeing such helpful responses as “Why don’t you just use Linux?” reinforced that idea.

When I had to opportunity to take a job doing open-source work, I let the romantic view win. I committed my first Drupal module with a lot of anxiety and wondered if my ego would survive that first “damn, you suck at this” email. I’m sure that email has been delayed somewhere in that series of tubes.

The Drupal tribe

I don’t use the word tribe in jest, but rather to emphasize the feel of the open-source community that I’m currently involved with. Tribes are made up of individuals with distinct identities and distinct goals. The tribe facilitates those goals in order to reach its goals, and the individuals contribute their time and skills in order to reach their personal goals. All of the tribe folk are in the tribe by their choice and contributing in the way they want; it is not the selfless commune from Ayn Rand’s Anthem. The cycle of life and death in traditional human tribes is replaced by the arrival of new members who want to contribute and the departure of existing members that choose to move on to other things.

Being able to function as a member of the tribe has significant importance. It’s at least as important as technical skill. The tribe survives by nurturing those members who add to it and ostracizing those that don’t. I observed an instance of this when someone with strong industry credentials tried, rather arrogantly, to assert himself on those credentials alone. Several members of the tribe tried, without success, to persuade him to change his approach. Eventually, he stopped trying to participate after the group started ignoring him. I’ve heard stories of other times this has happened.

Tribal participation isn’t the same as being a “team player” at typical software companies. A tribe holds itself together by the will of its members; the typical corporate team is held together by the will of its management. This difference is subtle, but I believe that understanding it is essential in order to function in open-source projects. Coming from an industry background, it can feel like there must be someone in charge. Sometimes that can lead to the newcomer trying to take charge and causing friction.

There is a strong mentor contingent in the tribe. There is the occasional snarky answer to a question, but the general vibe of the tribe is that everyone starts out knowing nothing. No one is viewed as stupid for having a basic question.

Drupal is the only-open source community I’ve been actively involved with so far, so I don’t know how universal these observations are. I would guess that there are more similarities than differences with other open source communities. My expectations at this point, however, are rather high.

Perils and thrills

Open source development is not without danger. It has certain addictive properties. It tickles my ego whenever I get an issue on one of my Drupal themes. Solving the issue provides another tickle, and I find myself contemplating what to do to get more of those thrills. There are warning signs for addiction. For me, when I had a dream in which I was writing Drupal code while webchick YELLED AT ME on IRC, I decided it was time to take a break.

Coda

In writing this, I realize I am preaching to the converted. One of the easiest things to loose sight of in any human activity is the beginner’s mind. What motivated us to get involved? What did it feel like when we got started? What were the joys? And what were the frustrations? My personal experience getting involved with open-source was overwhelmingly positive, but there was trepidation in the beginning, and I don’t think I’m unique in that respect.

Open-source projects thrive through an infusion of fresh ideas and fresh minds. I believe it’s important as we seek out those new minds to remember what it was like for us when we were new and wanting to contribute. It’s not something that can be addressed in an FAQ. Newcomers will see it in our interactions with them and in our interactions with each other. It is subtle, but it’s never unseen.

Using Bazaar to work on Drupal core patches

As anyone who’s developed core patches knows, it’s not the writing of initial patch that takes the work, it’s the combination of revising the patch and keeping up to date with HEAD. With Drupal.org’s CVS, this is difficult because you cannot commit your core changes to checkpoint your work. CVS’s merge algorithms are also relatively poor for maintaining large divergence from CVS HEAD.

One option — which several community members used to create the huge DB-TNG patch — was to create a Subversion repository. This is far too much overhead for smaller core patches, and it still required person-by-person approval to commit to the Subversion repository.

Four Kitchens is now hosting a much more flexible system for users of Bazaar. Every hour, we synchronize core changes from CVS HEAD into our Drupal 7 branch, which is available for anonymous branching by anyone.

Simply download and install Bazaar, which has packages or installers for Linux, Windows, Mac OS X, and BSD. We’re using a repository format that should work with Bazaar 1.0 and newer, at a minimum.

Then, when you’re ready to develop, create a local branch and make its new directory your working directory.

Then, where you’re ready to develop, run this:
bzr branch bzr://vcs.fourkitchens.com/drupal/7 [optional-working-copy-directory]

You’ll have a fresh working copy with a number of benefits over a CVS checkout:

  • You can commit to your local branch to checkpoint your work: bzr commit
  • You can integrate in changes to HEAD using Bazaar’s superior merge algorithms:
    bzr merge
  • You can branch from your own branch: bzr branch [existing] [new]
  • If you’re working with others and they have Bazaar branches, too, you can merge from their branches to collaborate.

And when you’re ready to post a patch, run
bzr diff --old bzr://vcs.fourkitchens.com/drupal/7-all-history > mypatch.patch
to create a patch reflecting all of your changes relative to CVS HEAD.

And when you’re ready to post a patch, run bzr diff --old bzr://vcs.fourkitchens.com/drupal/7 to create a patch reflecting all of your changes relative to CVS HEAD.

It will be a while before Drupal.org moves to anything other than CVS. Until then, we can foster decentralized development quite effectively using something like the Four Kitchens repository.

Note (2009-12-07): Yes, I realize Launchpad has a similar service that stores more detailed revision data, but I often find Launchpad slow or unresponsive. We also maintain Bazaar branches of stable Drupal 5 and 6; just replace the “7” above with “5” or “6”.

Note (2009-02-09): I have updated this post to use the new Bazaar branch for Drupal HEAD. We also maintain Bazaar branches of stable Drupal 5 and 6; just replace the “7-all-history” above with “5” or “6”. All obsolete information is now crossed out and replaced. I’ve maintained the obsolete instructions in case anyone is actively using them; they will continue to function.

Drupalers for Drupal: DrupalCon campaign buttons

Just for kicks, here are some DrupalCon "campaign buttons" we cooked up while brainstorming T-shirt ideas. We're releasing them under the GPLv3, so feel free to use `em as you see fit!

SVG and PNG formats are linked below. If you'd rather not work with a vector program, you can also download all three buttons as a layered, smart object PSD file.

Button #1: Drupalers for Drupal

DrupalCon campaign button 1

SVG format | 150px PNG | 300px PNG

Button #2: Stars `n' stripes

DrupalCon campaign button 2

SVG format | 150px PNG | 300px PNG

Button #3: America's ticket

DrupalCon campaign button 3

SVG format | 150px PNG | 300px PNG

Pages