Creating common branch ancestry is a hard problem

One of the key features of distributed version control systems (DVCS) is support for divergent development (branching) and then merging. Most DVCS tools, including git and Bazaar, include rather elegant support for such workflows by embedding metadata about common ancestry into branches. In this post, I’ll be focusing on Bazaar.

“Common ancestry” means an identical revision shared by two branches. The most recent common ancestor typically indicates the point of branching, unless there has been a more recent merge. A successful merge between two branches establishes a new, more recent common ancestor. Identifying a common ancestor is a required step for performing automatic three-way merges to integrate changes from a foreign, divergent branch.

Typically, the branching metadata allows Bazaar to automatically determine the most recent common ancestor to use as the base revision for three-way merging. The problem comes when you need to merge two branches that do not share any common ancestors. (For the purpose of this post, I am not counting revision zero, the universal common ancestor, as a common ancestor. It is generally useless for merging.)

Creating common ancestry
If you try to merge two branches without common ancestry and without any revision identifiers, Bazaar will complain and do nothing. You can specify a revision range (like -r5..-1, which would be everything from the fifth revision to the latest on the foreign branch) and then apply the merge. Bazaar will then merge the changes in the specified revision range into your local branch and establish the last merged revision as the latest common ancestor, making future merges a breeze. Unfortunately, finishing this initial merge is where dragons lie. But before I can get into the difficulty of creating common ancestry by merging two unrelated branches, I have to briefly discuss how Bazaar handles files.

How Bazaar tracks files
Bazaar maps file paths to globally unique file IDs, and two branches without prior common ancestry will have different file IDs for the same paths, even if the files are really the same. Every time a file is added to a Bazaar branch, it gets a unique ID. As files are moved and renamed, they keep their unique IDs. So, if two people download Drupal, extract it, and independently “bzr init”, “bzr add”, and “bzr commit”, their respective README.txt files (for example) will have different file IDs.

Creating common ancestry (continued)
These globally unique file IDs cause trouble when merging from unrelated branches. When Bazaar merges two branches (related or not), two files with the same path but different file IDs create a conflict even if they contain identical content. Merging two unrelated branches with lots of shared files creates a mess of conflict files and conflict directories, and there is currently no convenient way to resolve these conflicts.

A concrete example with Drupal
A developer downloads Drupal and puts the project under version control in a fresh Bazaar branch. She discovers that Four Kitchens maintains a Bazaar branch of stable Drupal releases and decides she would like to use the Four Kitchens branch to automate installation of minor Drupal updates. She attempts to merge in the revision range from the Four Kitchens branch that would perform the upgrade. Because the file IDs in the Four Kitchens branch differ from hers, every Drupal core file and directory has a conflict despite having identical content for the base merge revision. After a bit of tedious conflict cleanup, she commits the merge and goes back to enjoying Bazaar’s generally elegant architecture.

I thought this blog post was going to give me a solution!
Nope, there’s not one out there yet. I’m currently looking at writing a custom merge handler (subclassed from the standard merge3 in Bazaar) that would intelligently handle merges where file paths do represent the same files, regardless of file IDs. Unfortunately, the file ID/path conflict is low-level in Bazaar and occurs before reaching the most modular part of merge conflict resolution.

Thanks to Robert Collins on the Bazaar project for walking me through the Bazaar internals necessary for me to explain this issue and, hopefully, solve it.

Commenting on this Blog post is closed.

Comments

Though we use subversion in-house, I do a lot with git personally. While I’m not terribly familiar with the nitty gritty of how git works, isn’t it the case that it’s relatively indifferent to files as conceptual units (their names, their uuids, etc.) and more interested in the content of the files? Once I realized that it handled renames implicitly, my world basically changed. I’d be willing to be proven wrong, but isn’t there some merit in dealing with the content of the files directly, rather than with the files as blobs, particularly in problems like this.

I’m not trying to start a holy war here, as there are problems with git (the history rewriting thing, is particularly troublesome) and it’s still a bear to learn. But it’s an interesting question.

While git works on the content level and not the file level, I assume it still has to have some identifier for content that weaves its way through history, and those identifiers would also clash when merging two branches with nearly identical content but no common branch history.

I’d like to verify this assumption, but I can’t find good documentation for using git to merge two unrelated branches.

I can confirm that git chokes when you’re trying to merge two unrelated branches with nearly identical content. I had it fail on me just yesterday. And I was, in fact, hoping this blog post would give me a solution. :)

Aha. Doing some further research, I discovered you can fake commit history in git by setting up a “graft.” You create a text file at .git/info/grafts, where each line contains a commit id followed by its parents. Your local repository will henceforth act as if that commit actually descended from both parents. I unfortunately have no hints for how to handle it in bzr…

It’s not clear that this would solve the problem of content conflicts, and the linked article was not really about merging unrelated branches. Have you tried the graft method yourself?

Thanks for writing this out clearly. I am new to bazaar but already hooked up to all its features compared to svn/cvs. Let’s hope the issue of creating common branch history for the same files will be addressed by the bazaar development team.

I can confirm that git chokes when you’re trying to merge two unrelated branches with nearly identical content. I had it fail on me just yesterday. And I was, in fact, hoping this blog post would give me a solution.