20160501

How to untangle a commit

To untangle a commit you have incrementally to 1) commit, 2) stash save, 3) test, 4) commit fix (optional), 5) revert fix (optional) and 6) stash apply.

Every now and then you have a commit that is 100% working and following your quality standards, but it is simply to big to be consumed in one code review. What you need to do is to break it up in multiple commits.

You can just to do that with git in a safe way and with reasonable afford using some git yoga. The process is pretty much like swapping a carpet with a sofa on it.

Let's start a session with three files a, b and c.

$ cd $(mktemp -d)
$ git init
Initialized empty Git repository in /tmp/tmp.a1UbQEZsn2/.git/
$ git checkout -b initial
Switched to a new branch 'initial'
$ git commit --allow-empty -m"Initial" 
$ git checkout -b big_one
$ echo "a" > a
$ echo "b" > b
$ echo "c" > c
$ git add a b c
$ git commit -m"the big one"
[big_one 35beda3] the big one
 3 files changed, 3 insertions(+)
 create mode 100644 a
 create mode 100644 b
 create mode 100644 c
We added a, b and c. This is the state we consider too large for one code review. Let's go back to the initial state with files a, b and c unstaged:

$ git checkout initial
$ git cherry-pick --no-commit big_one
$ git reset HEAD
$ git status
On branch initial
Untracked files:
  (use "git add file..." to include in what will be committed)

 a
 b
 c
We want to add a, b and c in three separate commits we can send out as three independent code reviews.

The overall process is simple for the happy case. Add a file and commit it. We use stash to clean the workspace. Then we check if the latest commit was successful. The example uses bash conditions as a check. This feedback would normally come from your test suite.

$ git checkout -b a
Switched to a new branch 'a'
$ git add a
$ git commit -m"a"
[a b73878a] a
 1 file changed, 1 insertion(+)
 create mode 100644 a
$ git stash save -u stash_b_c
Saved working directory and index state On a: stash_b_c
HEAD is now at b73878a a
$ [ -f a ] && [ ! -f b ] && [ ! -f c ] && echo Well done # we run a test
Well done


$ git stash apply
On branch a
Untracked files:
  (use "git add file..." to include in what will be committed)

 b
 c
$ git checkout -b b
Switched to a new branch 'b'
$ git add b
$ git commit -m"b"
[b 64fac68] b
 1 file changed, 1 insertion(+)
 create mode 100644 b
$ git stash save -u stash_c
Saved working directory and index state On b: stash_c
HEAD is now at 64fac68 b
$ [ -f a ] && [ -f b ] && [ ! -f c ] && echo Well done # we run a test
Well done

$ git stash apply
On branch b
Untracked files:
  (use "git add file..." to include in what will be committed)
 c
$ git checkout -b c
Switched to a new branch 'c'
$ git add c
$ git commit -m"c"
[c 2f66c1b] c
 1 file changed, 1 insertion(+)
 create mode 100644 c
$ [ -f a ] && [ -f b ] && [ -f c ] && echo Well done # we run a test
Well done
When we diff with big_one they have the same content:
$ git diff big_one && echo Passed
Passed
The stash is the trail of changes we made, when we "removed" files from the working directory temporarily.

$ git stash list
stash@{0}: On b: stash_c
stash@{1}: On a: stash_b_c
For now we only used stash to clean the working directory to be able to commit and to restore those files after commit. By using git stash apply we keep the trail of changes.

If you're trying this with real code it won't be so simple. Things will go wrong. We forgot to add files to a commit and our tests might start failing. We have to be able to backtrack and restore previous state. We never want to lose work.

Using stash we can do all this safely and methodically. Let's go back to the initial state with the big_one branch containing all changes and the files a, b, and c as untracked changes.

$ cd $(mktemp -d)
$ git init
$ git checkout -b initial
$ git commit --allow-empty -m"Initial" 
$ git checkout -b big_one
$ echo "a" > a
$ echo "b" > b
$ echo "c" > c
$ git add a b c
$ git commit -m"the big one"
$ git checkout initial
$ git cherry-pick --no-commit big_one
$ git reset HEAD

$ git status 
On branch initial
Untracked files:
  (use "git add file..." to include in what will be committed)

 a
 b
 c
In this session we make the error of adding a and b in the first commit, when we actually only should add a.

$ git checkout -b a
Switched to a new branch 'a'
$ git add a 
$ git add b                                  # b - problem to fix 
$ git commit -m"Adding a (actually also b)"
[a c604de5] Adding a (actually also b)
 2 files changed, 2 insertions(+)
 create mode 100644 a
 create mode 100644 b
$ git stash save -u stash_c
Saved working directory and index state On a: stash_c
HEAD is now at 5d031e6 Adding a (actually also b)
$ [ -f a ] && [ ! -f b ] && [ ! -f c ] && echo Well done || echo Error # we run a test
Error
The tests fails now. We have to remove b, then the test passes.

$ git rm b                                               # remove b
$ git commit -m"Remove b"
[a 043f3a7] Remove b
 1 file changed, 1 deletion(-)
 delete mode 100644 b
$ [ -f a ] && [ ! -f b ] && [ ! -f c ] && echo Well done || echo Error # we run a test
Well done
The branch a has the right state now - only the file a, but the stashed change only contains c. We lost b.

How can we recover from this state if the desired state is a subset of changes from the branch and the last stashed changes? We need some git yoga to get where we want.

We added a and b in the first commit and removed b in the second commit. Together both commits leave only a and pass the test. This is the valid state we want for branch a.

The stash contains only c, so we have to restore b. We do this by reverting the state before we removed b. This is the state of the working directory when the change was stashed. Then we apply the latest change from on the stash which is c. The result is that we have a, b and c ((a + b) – b + b + c = a + b + c) in the working directory. The file a is committed. The files b and c are uncommitted changes.

 
$ git revert --no-commit HEAD                            # restore b
$ git stash apply                                        # b & c
On branch a

Changes to be committed:
  (use "git reset HEAD file..." to unstage)

 new file:   b

Untracked files:
  (use "git add file..." to include in what will be committed)

 c
When we look at the history of branch a, it's not exactly the result we wanted. The branch a now contains two commits. We squash those to one commit.

$ git stash
Saved working directory and index state WIP on a: 043f3a7 Remove b
HEAD is now at 043f3a7 Remove b
$ git reset --soft HEAD~
$ git commit --amend -m"Add a for real"
[a bd012df] Add a for real
 1 file changed, 1 insertion(+)
 create mode 100644 a
$ git stash apply
On branch a
Changes to be committed:
  (use "git reset HEAD file..." to unstage)

 new file:   b

Untracked files:
  (use "git add file..." to include in what will be committed)

 c

The squashing is optimal and be done later using a interactive rebase git rebase -i initial for the branch a. From here we can commit b and c separately just as in the happy case before. This gives us the same result:

$ git stash apply
$ git checkout -b b
$ git add b
$ git commit -m"b"
$ git stash save -u stash_c
$ [ -f a ] && [ -f b ] && [ ! -f c ] && echo Well done # we run a test
Well done

$ git stash apply
$ git checkout -b c
$ git add c
$ git commit -m"c"
$ [ -f a ] && [ -f b ] && [ -f c ] && echo Well done # we run a test
Well done

$ git diff big_one && echo Passed
Passed
With stash we can keep safe-points of our changes, clean and restore the working directory. We also use it together with git revert to undo changes.

This process is also applicable to save results from explorative coding. If I hack together a solution end-to-end to see that it's actually works, I can keep those results. I rebuild it then cleanly from the ground and at the same time make sure that the overall end-to-end solution works. This gives you the quick feedback for the exploration, end-to-end testing and quality of actually building it incrementally with the quality you want. It's the best of both worlds.

In a real world scenario the untangling isn't that rigid with strictly adding a, b and c in single commits, such that the the sum of all incremental commits is strictly the same as the big initial commit. Most of the time you will see small problems on the way you want to address in the process. It's strictly speaking less safe because now diffing isn't a simple binary result any more, but it allows a more fluid more workflow. If you're uncomfortable with that you can also split the results in n commits first, keep notes of the necessary changes, and add the fixes in separate commits later (on the branches a, b and c).

In the given session the result are the branches initial, a, b, c that build on each other in this order. We can still improve on that. If you observe that a, b, c are actually orthogonal then the can rebase b, c on the initial branch. This makes the review work parallelizable. Not all features are orthogal. If you have two different ways to cut a commit you should choose the one that makes the changes as independent as possible. This also makes rebasing and merging easier.
$ git log --pretty=oneline --patch a --
bd012df01c7889182c6295725faf942f90fa251e Add a for real
diff --git a/a b/a
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/a
@@ -0,0 +1 @@
+a
7a358aa4a6c6d4cb52cc403c3cc7d3b59c208c70 Initial

$ git checkout initial
$ git checkout -b b_orthogonal
$ git cherry-pick b
$ git log --pretty=oneline --patch b_orthogonal
006022af90003f57bde0ad2700201aee73163b95 b
diff --git a/b b/b
new file mode 100644
index 0000000..6178079
--- /dev/null
+++ b/b
@@ -0,0 +1 @@
+b
7a358aa4a6c6d4cb52cc403c3cc7d3b59c208c70 Initial

$ git checkout initial
$ git checkout -b c_orthogonal
$ git cherry-pick c

$ git log --pretty=oneline --patch c_orthogonal
43a58337baf1e337f0ff1fae72c2c5b955874027 c
diff --git a/c b/c
new file mode 100644
index 0000000..f2ad6c7
--- /dev/null
+++ b/c
@@ -0,0 +1 @@
+c
7a358aa4a6c6d4cb52cc403c3cc7d3b59c208c70 Initial
The overall process isn't trivial, but once you know the necessary bits of git (stash, revert, checkout, commit, branch) it should be fairly easy to remember that the only steps necessary are to 1) commit, 2) stash save, 3) test, 4) commit fix (optional), 5) revert fix (optional) and 6) stash apply. If you can keep that in your head you know one solution to this problem.