Alternatives to rebasing in Bazaar

A discussion recently arose on the Bazaar mailing list asking, “Why isn’t rebase support in core?” Rebase support is currently packaged as a plugin. This plugin is widely distributed, even in the standard Mac OS X installation bundle.

There are boring reasons that rebase support isn’t in core, like the lack of strong test coverage. More interesting are questions about the necessity of rebasing in typical workflows.

What is rebasing, and why should I care?

In large projects, there’s a mainline branch representing the current, global, coordinated development. In Drupal’s case, this is CVS HEAD. This mainline might not always be in perfect condition, but there’s a general sense that the mainline is not a sandbox for untested changes. Many changes are small enough that the developers simply work on and test a patch, but this workflow is inadequate for larger development projects like Fields in Core. Such large features require their own branch for development, a feature branch.

A feature branch allows development of a feature in isolation from the mainline but with the eventual intent of merging the changes back into the mainline. Because feature branches are created to foster long-term, divergent development from the mainline, it’s common for both feature development and mainline development to happen in parallel. This parallel development creates a problem: How do developers on the feature branch prepare for the eventual re-integration of their feature code into the mainline?

There are a few options:

  • Don’t sync changes. This option makes merging the feature back into the mainline painful. This option also defeats the purpose of developing and testing the feature in isolation because merging two tested (but divergent) branches often results in one broken (but converged) branch.
  • Merge the feature into the mainline before making any changes to the mainline and then re-branch for more feature work after making mainline changes. Merging an untested or incomplete feature into the mainline makes this option unattractive and impractical. This option is so silly, I only included it for completeness.
  • Periodically update the feature branch from the mainline. This is ideal because the feature branch continually answers the question “What if we merged this feature into the mainline?” and is ready for quick merging into the mainline without any disruption to mainline work.

The third option is the only practical one. But how should it work? What should the feature branch history look like after synching from the mainline?

Back to rebasing…

Rebasing integrates the updates to the mainline as ancestors to the changes on the feature branch. The commit history is reorganized (read: rebased) as if the feature branch were freshly created from the mainline and all work were done on top of that. There are many theoretical objections to rebasing, and I won’t rehash them here. There’s general consensus that rebasing is sort of icky.

I find that many rebase users use the tool because they’re not aware of better workflows. I’ll address each (supposed) reason to use rebase in its own section.

“I want to keep my feature branch updated from the mainline.”

The better choice is to run bzr merge [mainline] on the feature branch. This command will update the common ancestry between the feature and mainline branches so that the feature branch includes the latest changes from the mainline and is ready for smooth merging back into the mainline.

“I want to view only the revisions that make up the feature I’ve been working on.”

With a rebase, it’s reasonably clear which revisions constitute the feature work: they’re the top ones. But rebasing is not the best choice for reviewing this list. Run bzr missing --mine-only [mainline] from the feature branch, and Bazaar will output all the feature branch’s unique revisions without mangling the actual history (the way rebasing does).

“I want a human-readable summary of how merging the feature into the mainline will affect the code.”

For background, a rebase user would run a diff from the oldest feature-specific commit to the latest commit, but there’s a better way. Instead, run bzr diff --old=[mainline], and Bazaar will provide the net diff for merging the feature into the mainline. Now, don’t use this diff for anything but human review; you should still use bzr merge from the mainline to integrate the feature branch’s changes and preserve all history.

Creating a merge directive with bzr send provides an identical human-readable diff to the method above, but a merge directive also includes all the binary data Bazaar needs to perform a history-preserving merge.

“I want to maintain a patch set on top of the mainline.”

Rebasing commits is an ugly way to do this because you don’t retain your own history of work on each patch or the history of how rebasing has changed each patch. Bazaar has a plug-in called “Looms” that provides direct support for a much better patch set workflow. I’m a touch skeptical of Looms’ stability, so I just do what Looms does under the hood: maintain multiple branches, each derived (branched) from the one below. Each branch represents a patch. This method retains full, original history, including any changes I’ve made to the patches. When the mainline updates, I simply merge the mainline changes up through my patches.

“I want to clean up my commit history prior to submitting my changes to the mainline.”

Rebasing may group the feature commits, but it doesn’t make them coherent or pretty. It’s more effective to do the following:

  1. bzr merge [mainline]
  2. Use bzr diff --old=[mainline] on the feature branch to create a net diff.
  3. Get a fresh branch from the mainline.
  4. Apply the net diff as a patch.
  5. Shelve all changes.
  6. Work through unshelving the changes and committing them to create a coherent, pretty history.
  7. Create a merge directive using bzr send.
  8. Submit the merge directive.

“[Your reason here]”

I’d like to hear from users of any distributed version-control system why they use “rebase” in their workflows, even if their reason is one I’ve discussed above.

Commenting on this Blog post is closed.

Comments

Hi David,

Nice post.

I have two more useful commands for “I want a human-readable summary of how merging the feature into the mainline will affect the code.”

Firstly, “bzr diff -r ancestor:../mainline” when in a feature branch will
show you the diff of what you did in this branch, and only that. This differs
from using “—old” as “—old” will include changes in the mainline as well.
Both useful in different situations, so it’s useful to know both.

(What ancestor:../mainline is doing is finding the point where you
branched from the mailine, and comparing your current working tree to
that revision, instead of the tip of mainline, so it’s as if nothing
happened on mainline while you were working on the feature branch.)

The second command is for seeing exactly what would happen if you merged
the feature to mainline, conflicts and all. You can of course do the “bzr
merge” and then examine the changes and “bzr revert” when you are done,
but there is another way. “bzr merge —preview ../feature-branch” from the
mainline will show you a diff of what the result of the merge would be,
including any conflict markers. Again a useful command to know in certain situations.

(While it doesn’t add any functionality that can’t be achieved with
“bzr merge”, “bzr diff”, “bzr revert”, it can be nice to have it not
affect the mainline at all, and it’s quicker to type.

In addition, if you are a python coder then accessing this functionality
through the API allows you to do some very nice things.)

Thanks,

James

I’m only running bzr diff --old right after merging from the mainline, so it probably makes no difference, but thanks for the clarification.

As a patch reviewer, I want patch series to be submitted

  1. …in logical, bite-size pieces. Each step in the patch series should be self-contained, functional, and as simple as possible to review.
  2. …relative to the current code version. I have that state of the code in my head, and I don’t want to have to review sub-patches that apply against some old version of the code, and additional merge sub-patches that show how earlier groups of patches have to be transformed to accommodate changes that have meanwhile occurred in the mainline.

As a patch submitter, I think rebase is a handy tool to create such patch series.

For background, a rebase user would run a diff from the oldest
feature-specific commit to the latest commit, but there’s a better way.
Instead, run bzr diff —old=[mainline], and Bazaar will provide the net
diff for merging the feature into the mainline.

This is no good because it gives one giant patch to review, rather than the individual steps with individual commit messages explaining them.

Rebasing may group the feature commits, but it doesn’t make them coherent
or pretty. It’s more effective to do the following:

  1. bzr merge [mainline]
  2. Use bzr diff —old=[mainline] on the feature branch to create a net diff.
  3. Get a fresh branch from the mainline.
  4. Apply the net diff as a patch.

The above is exactly what rebasing does! (Except that rebasing doesn’t smash all of the subpatches together and force you to pick them apart again manually.)

  1. Shelve all changes.
  2. Work through unshelving the changes and committing them to create a coherent, pretty history. […]

Ugh. Wouldn’t you like the first draft of the individual patches to be done by the tool, rather than by hand?

Of course, individual patches in the rebased patch series should be viewed and tested before they are submitted upstream. An of course rebasing should only be used for smallish changes (certainly nothing that needs more than one developer). But within those limitations, I have found rebasing to be a very valuable tool.

When it comes to working with a patch series, I’ve already advocated a Loom-like model that answers all of your objections. Loom produces a set of patches, grouped by functionality, relative to the current code version.

A good thing about the rebase approach is that user works with the regular commit, diff, log etc. tools. User just commits normally and can later decide to edit the commits. With bzr looms or mercurial queues user has to learn and use additional command set and switch her mind to “now I work with patches” mode.

The same workflow is certainly possible with with all bzr, hq and git but with “git rebase” (—interactive) kind of command it’s easier: no extra commands and no context-switching between “history” and “patches”. Here’s some discussion on the http://thread.gmane.org/gmane.comp.version-control.mercurial.general/135...">Mercurial list.