A Git Workflow Using Rebase

by Chris Belyea

Git can be tricky to use, especially in a team setting where stale branches and merge conflicts tend to cause problems when you least have time for them. In this guide, I’ll explain a Git workflow using rebase that scales well and that my team has successfully used on multiple projects for a variety of clients, from XX to XX.

Git provides a tremendous amount of freedom and little guidance; this workflow, on the other hand, is very opinionated. This workflow certainly isn't the only way to use Git, but its prescriptive nature allows you to spend more time on actual work and less time fiddling with Git. You shouldn't need to think about version control. Once you're confident with this workflow, Git will fade into the background, allowing you to focus on what’s important: the code.

Why rebase?

Rebase is one of two Git commands that integrates changes from one branch onto another. (The other command is merge.) Rebase can be a very destructive operation. It literally rewrites Git commit history, which is a big no-no in most cases. It does, however, offer some advantages, including a way to cleanly incorporate new code into a feature branch and a way to keep meaningless commit history out of an repository's master branch (see "Why squash?" below). The end result is a linear commit history on the master branch, which makes it easy to see how the code evolved.

Why squash?

There are two rebase modes: manual and interactive. An interactive rebase operation allows you to squash your commits, combining many commits into fewer, or even one singular commit. There are several reasons you might want to do this. If you have many commits in your branch and merge your branch into master, all of these commits will end up in master (possibly with a merge commit). In some cases that's desirable, but what if your branch has a lot of commits that contain minor fixes? You could end up with a commit history that looks like this:

And that's not the most egregious example of meaningless commit histories I’ve seen!

It's a good practice to commit often so that you can roll back changes in small increments if needed. But those small, incremental changes are only meaningful in the context of the branch you're developing on. In other words, each branch should only contain commits relevant to the implementation of a single feature (or bug fix). The consumers of your code (i.e., other developers on the project) are interested in your working code, not the fact that it took you 35 commits to get your code to a working state. Put another way, nobody cares about what it took for you to get your code working, they just want your finished product. Merging a history like the one above into master results in a lot of non-valuable noise. It's like handing in a term paper to your teacher with all of your notes and rough drafts stapled to it.

By squashing your commits, you get rid of the extraneous commits. If think of the commit log above as your “before,” this is your “after:”

(INSERT GH CODE)

Now, the entire implementation of your feature is contained in one commit. From the master branch perspective, each commit on the branch is a complete implementation of one feature. Put another way, the master branch shows a linear history of implemented features. If you introduced a feature but now need to roll it back, you have only have to revert one commit.

And if you write good commit messages, like this,:

(INSERT GH CODE)

then when you open a Pull Request your Git platform will automatically fill in the summary and description fields from your commit message and close the tagged issue when the pull request is accepted.

If this doesn't make sense yet, keep reading. After I walk you through the workflow, the value should become more obvious.

Ground rules for this workflow:

  • You’re using the Git CLIThe CLI is consistent across platforms. Most Git GUIs, on the other hand, have too many vague abstractions that would make following this guide difficult. If you want, have a GUI tool like Sourcetree, GitUp, or gitk running side-by-side to help you visualize what's happening.
  • You’re using GitHub, GitLab, Bitbucket, or another Git platform that supports the concept of GitHub-style Pull Requests.
  • You have a Git repository that contains the code that you and your team are working on. For the purposes of this guide, we'll call that repository pebble.
  • Every contributor forks the repository and works in their own fork. On public projects where you don’t have write access this is the only way to do it. But forking also works well for internal projects because it gives each contributor their own private workspace. Forks are free, so there isn't a compelling reason not to use them. Forking prevents wo developers from collaborating on the same branch without granting additional permissions, but you shouldn't do that anyway because you increase your chances of encountering merge conflicts. If you think you need more than one person working in a single feature branch, then your feature is too big and should be broken into smaller units of work.
  • The most important rule: Don't rebase a branch that multiple people have access to! Only rebase branches in your fork. As I’ve already mentioned, rebasing is a destructive operation. If you're doing it in your repository, which only you can access, then there's no issue. If you rebase a branch that other people have access to, you’re going to run into trouble. So only rebase your own branches, and only push those rebased branches to your own fork.

Roles

For the purposes of this workflow, there are two Git-related roles. In practice, one person may fill both roles, especially in a solo/small project. For larger teams or public projects, the role delineation is a necessity.

  • Maintainer(s). These people have write permissions to the repository. They review Pull Requests and accept or reject them as appropriate. They also create Git tags for releases.
  • Contributor(s). These people have read (and therefore, fork) permissions to the repository. They can view and create issues and submit pull requests for review. Contributors are also responsible for resolving any merge conflicts. A contributor can only push to his or her fork.

Setup

To get set up you need to fork the project repository (we'll call this upstream), clone your fork (we'll call this origin), and then add a remote in your local cloned repository pointing back to upstream.

1. Fork the upstream repository. Follow the instructions for your Git platform to do this. Your fork should end up in your private user namespace.

2. Clone your fork to your computer. By default, when you clone a repository Git will automatically set up a remote called origin that points back to the clone source.

(INSERT GH CODE)

If you fork a private repository, GitHub will also keep your fork private, even if you're not on a paid plan that allows for private repos.

3. Add a second remote called upstream that points back to the upstream project. The upstream URL is the same one you'd use to clone the repository directly. This will allow you to pull in upstream changes.

(INSERT GH CODE)

4To confirm your setup, you can run git remote --verbose which should show both remotes.

(INSERT GH CODE)

The Workflow

The code changes you commit should generally tie back to a story or issue. These stories and issues may exist in an external system such as JIRA or VersionOne, but for this guide we'll assume that you're using the Issues feature of your Git platform to track work.

I’m using the term issues to refer to all code-related work, including bug fixes and new feature development.

At a high-level, the workflow can be described in a few steps:

  • Fetch upstream changes.
  • Merge upstream/master branch into local master branch.
  • Create a branch.
  • Write code and commit to your branch as you go.
  • Fetch from upstream again (in case upstream master has had new commits since you started your branch).
  • Rebase and squash your branch against upstream/master, resolving any merge conflicts.
  • Push your branch.
  • Open a pull request.

Here’s more detail:

Step 1: Fetch upstream changes.

You should always be working with the latest version of the codebase. Since the official code repository is upstream, fetch those changes. Git will store the contents of upstream's master branch locally in upstream/master.

(INSERT GH CODE)

Step 2: Merge upstream/master branch into local master branch.

It's simplest to create a branch off of your local master branch. Before you do so, however, you should merge upstream/master into master so that you have the latest code.

(INSERT GH CODE)

This will perform a fast-forward merge leaving master and upstream/master pointing at the same commit.

One implication of this is that the master branch on your fork (origin/master, from your perspective) has no purpose in this workflow. Upstream has the canonical master branch and you're periodically updating your local master branch from it.

 Step 3: Create a branch.

Now that master is up-to-date, create a branch to track the work for your issue.

(INSERT GH CODE)

Step 4: Write code and commit to your branch as you go.

This is where you do your actual development, committing whenever it makes sense. Once you've finished coding, proceed to step 5.

Step 5: Fetch from upstream again.

Your coding is complete, but before you open a pull request and get it merged into upstream's master branch, you need to grab any new commits that have appeared upstream. (Remember, you may not be the only person working on this project!)

To get the new upstream commits, use git fetch:

(INSERT GH CODE)

Step 6: Rebase and squash.

Rebasing will change the original commit on which a branch is based. Rebasing will result in new commits (with the same commit messages) with new SHA-1 hashes. Squashing will condense commits into a new commit (or commits) with a new SHA-1 hash. Typically you'll want to rebase against the branch that you intend to merge into. When you eventually create your pull request, it will be from your fork's branch to upstream's master branch. Therefore, you'll want to rebase against upstream/master.

To squash, you need to run the rebase in interactive mode:

(INSERT GH CODE)

This will open your default editor and present you with a list of all of the commits that will be rebased. It will look something like this:

(INSERT GH CODE)

Each commit in your branch is listed at the top of the file (from oldest to most recent). The comments at the bottom of the file provide instructions on how to provide the rebase command with directions for each commit. You should keep the top/oldest commit and squash all of the other commits into it. To do this, simply change pick to squash on the second and third commits. It will look like this when you're done:

(INSERT GH CODE)

When you save the file and exit your editor, rebase will continue, following the instructions you just provided.

As the instructions explain, squash will preserve the commit message and present it again at the very end of the rebase operation, where you'll craft the commit message for your new, squashed commit. This is useful when you're squashing several significant commits together and need the old commit messages to craft a coherent new one. If you are squashing trivial commits (especially those of the "forgot semicolon" variety) and don't need those commit messages for the last step, you can use fixup instead.

As rebase processes your commits, it may run into a merge conflict (for example, if you and upstream changed the same part of a file). If this happens, rebase will pause and wait for you to manually resolve the conflict. To do this, simply run git status to find out which file(s) have conflicts and then go into each one and resolve them. Once you're done, run git add to stage each file and then git rebase --continue. (You do not need to git commit merge conflicts.)

Once your commits have been squashed, Git will prompt you to write a commit message for your new squashed commit. You need to write a good commit message by following these seven rules.  Since this commit represents the entirety of your work on this feature, it's important to document what your commit changes. Additionally, your Git platform (e.g., GitHub) will use the first commit message in your feature branch to populate the pull request form.

7. Push your branch

In order to create a Pull Request you need to push your branch to origin (your fork of the upstream project). This is simple to do:

(INSERT GH CODE)

If you've already pushed your branch and need to update it, the above command will fail. Since a rebase rewrites commit history, you will no longer have a common commit on your branch and must use the --force option to instruct Git to discard the branch on your remote:

(INSERT GH CODE)

Now that you've seen firsthand how rebase rewrites history it should be obvious why you should never rebase any branch that is publicly accessible. If two people are working on a branch, and one rebases that branch and pushes it to GitHub, the next time the second person tries to git pull it will fail because the branch's history on GitHub will no longer match their local history. The second person would need to reset their branch to match GitHub (losing any local changes) to get things back in sync. When an entire team has to do that, the resulting  disruption and potential data loss become a big problem. As long as you only rebase branches in your private fork, you'll be fine.

8. Open a pull request

From here, you're ready to open a pull request from your fork's (origin) feature branch (issue-1) to the upstream repository's master branch. If you make any changes to your branch, just follow steps five through seven. When you push to a branch, your pull request automatically includes all changes.

Once the pull request is accepted, you can delete both your local feature branch and the branch on your fork (origin).

If your Git platform supports it, consider turning off merge commits for Pull Requests and having the platform do a fast-forward merge instead. 

Conclusion

That's it! This workflow may seem complicated at first, but once you use it a few times, the mechanics will start to feel natural. The commit history for the project's master branch will be a linear progression of feature additions. This rebase and squash approach is highly compatible with the popular GitHub Flow workflow. It can also be used with GitFlow for feature branches that you merge into develop. The forking aspect of the workflow is not strictly required, however if you use this workflow in a shared repository all contributors must agree not use each others’ branches. (In practice this is hard to enforce, which is why I prefer forking.) 

This workflow offers a viable alternative to the more traditional Git approach where branches are updated by merging and every minor correction is noted in the commit log. As a result, each commit represents a feature (or fix), and git log will show a clear, usable history that captures the implementation of features while omitting all of the rough drafts it took to craft each one.

Chris Belyea is a managing consultant at SingleStone, where he works on the Cloud/DevOps team, helping clients . . . 

Chris Belyea
Chris Belyea
Senior Consultant
Contact Chris