20160501

How to untangle a commit

To untangle a commit you have incrementally to 1) commit, 2) stash save, 3) test, 4) commit fix (optional), 5) revert fix (optional) and 6) stash apply.

Every now and then you have a commit that is 100% working and following your quality standards, but it is simply to big to be consumed in one code review. What you need to do is to break it up in multiple commits.

You can just to do that with git in a safe way and with reasonable afford using some git yoga. The process is pretty much like swapping a carpet with a sofa on it.

Let's start a session with three files a, b and c.

$ cd $(mktemp -d)
$ git init
Initialized empty Git repository in /tmp/tmp.a1UbQEZsn2/.git/
$ git checkout -b initial
Switched to a new branch 'initial'
$ git commit --allow-empty -m"Initial" 
$ git checkout -b big_one
$ echo "a" > a
$ echo "b" > b
$ echo "c" > c
$ git add a b c
$ git commit -m"the big one"
[big_one 35beda3] the big one
 3 files changed, 3 insertions(+)
 create mode 100644 a
 create mode 100644 b
 create mode 100644 c
We added a, b and c. This is the state we consider too large for one code review. Let's go back to the initial state with files a, b and c unstaged:

$ git checkout initial
$ git cherry-pick --no-commit big_one
$ git reset HEAD
$ git status
On branch initial
Untracked files:
  (use "git add file..." to include in what will be committed)

 a
 b
 c
We want to add a, b and c in three separate commits we can send out as three independent code reviews.

The overall process is simple for the happy case. Add a file and commit it. We use stash to clean the workspace. Then we check if the latest commit was successful. The example uses bash conditions as a check. This feedback would normally come from your test suite.

$ git checkout -b a
Switched to a new branch 'a'
$ git add a
$ git commit -m"a"
[a b73878a] a
 1 file changed, 1 insertion(+)
 create mode 100644 a
$ git stash save -u stash_b_c
Saved working directory and index state On a: stash_b_c
HEAD is now at b73878a a
$ [ -f a ] && [ ! -f b ] && [ ! -f c ] && echo Well done # we run a test
Well done


$ git stash apply
On branch a
Untracked files:
  (use "git add file..." to include in what will be committed)

 b
 c
$ git checkout -b b
Switched to a new branch 'b'
$ git add b
$ git commit -m"b"
[b 64fac68] b
 1 file changed, 1 insertion(+)
 create mode 100644 b
$ git stash save -u stash_c
Saved working directory and index state On b: stash_c
HEAD is now at 64fac68 b
$ [ -f a ] && [ -f b ] && [ ! -f c ] && echo Well done # we run a test
Well done

$ git stash apply
On branch b
Untracked files:
  (use "git add file..." to include in what will be committed)
 c
$ git checkout -b c
Switched to a new branch 'c'
$ git add c
$ git commit -m"c"
[c 2f66c1b] c
 1 file changed, 1 insertion(+)
 create mode 100644 c
$ [ -f a ] && [ -f b ] && [ -f c ] && echo Well done # we run a test
Well done
When we diff with big_one they have the same content:
$ git diff big_one && echo Passed
Passed
The stash is the trail of changes we made, when we "removed" files from the working directory temporarily.

$ git stash list
stash@{0}: On b: stash_c
stash@{1}: On a: stash_b_c
For now we only used stash to clean the working directory to be able to commit and to restore those files after commit. By using git stash apply we keep the trail of changes.

If you're trying this with real code it won't be so simple. Things will go wrong. We forgot to add files to a commit and our tests might start failing. We have to be able to backtrack and restore previous state. We never want to lose work.

Using stash we can do all this safely and methodically. Let's go back to the initial state with the big_one branch containing all changes and the files a, b, and c as untracked changes.

$ cd $(mktemp -d)
$ git init
$ git checkout -b initial
$ git commit --allow-empty -m"Initial" 
$ git checkout -b big_one
$ echo "a" > a
$ echo "b" > b
$ echo "c" > c
$ git add a b c
$ git commit -m"the big one"
$ git checkout initial
$ git cherry-pick --no-commit big_one
$ git reset HEAD

$ git status 
On branch initial
Untracked files:
  (use "git add file..." to include in what will be committed)

 a
 b
 c
In this session we make the error of adding a and b in the first commit, when we actually only should add a.

$ git checkout -b a
Switched to a new branch 'a'
$ git add a 
$ git add b                                  # b - problem to fix 
$ git commit -m"Adding a (actually also b)"
[a c604de5] Adding a (actually also b)
 2 files changed, 2 insertions(+)
 create mode 100644 a
 create mode 100644 b
$ git stash save -u stash_c
Saved working directory and index state On a: stash_c
HEAD is now at 5d031e6 Adding a (actually also b)
$ [ -f a ] && [ ! -f b ] && [ ! -f c ] && echo Well done || echo Error # we run a test
Error
The tests fails now. We have to remove b, then the test passes.

$ git rm b                                               # remove b
$ git commit -m"Remove b"
[a 043f3a7] Remove b
 1 file changed, 1 deletion(-)
 delete mode 100644 b
$ [ -f a ] && [ ! -f b ] && [ ! -f c ] && echo Well done || echo Error # we run a test
Well done
The branch a has the right state now - only the file a, but the stashed change only contains c. We lost b.

How can we recover from this state if the desired state is a subset of changes from the branch and the last stashed changes? We need some git yoga to get where we want.

We added a and b in the first commit and removed b in the second commit. Together both commits leave only a and pass the test. This is the valid state we want for branch a.

The stash contains only c, so we have to restore b. We do this by reverting the state before we removed b. This is the state of the working directory when the change was stashed. Then we apply the latest change from on the stash which is c. The result is that we have a, b and c ((a + b) – b + b + c = a + b + c) in the working directory. The file a is committed. The files b and c are uncommitted changes.

 
$ git revert --no-commit HEAD                            # restore b
$ git stash apply                                        # b & c
On branch a

Changes to be committed:
  (use "git reset HEAD file..." to unstage)

 new file:   b

Untracked files:
  (use "git add file..." to include in what will be committed)

 c
When we look at the history of branch a, it's not exactly the result we wanted. The branch a now contains two commits. We squash those to one commit.

$ git stash
Saved working directory and index state WIP on a: 043f3a7 Remove b
HEAD is now at 043f3a7 Remove b
$ git reset --soft HEAD~
$ git commit --amend -m"Add a for real"
[a bd012df] Add a for real
 1 file changed, 1 insertion(+)
 create mode 100644 a
$ git stash apply
On branch a
Changes to be committed:
  (use "git reset HEAD file..." to unstage)

 new file:   b

Untracked files:
  (use "git add file..." to include in what will be committed)

 c

The squashing is optimal and be done later using a interactive rebase git rebase -i initial for the branch a. From here we can commit b and c separately just as in the happy case before. This gives us the same result:

$ git stash apply
$ git checkout -b b
$ git add b
$ git commit -m"b"
$ git stash save -u stash_c
$ [ -f a ] && [ -f b ] && [ ! -f c ] && echo Well done # we run a test
Well done

$ git stash apply
$ git checkout -b c
$ git add c
$ git commit -m"c"
$ [ -f a ] && [ -f b ] && [ -f c ] && echo Well done # we run a test
Well done

$ git diff big_one && echo Passed
Passed
With stash we can keep safe-points of our changes, clean and restore the working directory. We also use it together with git revert to undo changes.

This process is also applicable to save results from explorative coding. If I hack together a solution end-to-end to see that it's actually works, I can keep those results. I rebuild it then cleanly from the ground and at the same time make sure that the overall end-to-end solution works. This gives you the quick feedback for the exploration, end-to-end testing and quality of actually building it incrementally with the quality you want. It's the best of both worlds.

In a real world scenario the untangling isn't that rigid with strictly adding a, b and c in single commits, such that the the sum of all incremental commits is strictly the same as the big initial commit. Most of the time you will see small problems on the way you want to address in the process. It's strictly speaking less safe because now diffing isn't a simple binary result any more, but it allows a more fluid more workflow. If you're uncomfortable with that you can also split the results in n commits first, keep notes of the necessary changes, and add the fixes in separate commits later (on the branches a, b and c).

In the given session the result are the branches initial, a, b, c that build on each other in this order. We can still improve on that. If you observe that a, b, c are actually orthogonal then the can rebase b, c on the initial branch. This makes the review work parallelizable. Not all features are orthogal. If you have two different ways to cut a commit you should choose the one that makes the changes as independent as possible. This also makes rebasing and merging easier.
$ git log --pretty=oneline --patch a --
bd012df01c7889182c6295725faf942f90fa251e Add a for real
diff --git a/a b/a
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/a
@@ -0,0 +1 @@
+a
7a358aa4a6c6d4cb52cc403c3cc7d3b59c208c70 Initial

$ git checkout initial
$ git checkout -b b_orthogonal
$ git cherry-pick b
$ git log --pretty=oneline --patch b_orthogonal
006022af90003f57bde0ad2700201aee73163b95 b
diff --git a/b b/b
new file mode 100644
index 0000000..6178079
--- /dev/null
+++ b/b
@@ -0,0 +1 @@
+b
7a358aa4a6c6d4cb52cc403c3cc7d3b59c208c70 Initial

$ git checkout initial
$ git checkout -b c_orthogonal
$ git cherry-pick c

$ git log --pretty=oneline --patch c_orthogonal
43a58337baf1e337f0ff1fae72c2c5b955874027 c
diff --git a/c b/c
new file mode 100644
index 0000000..f2ad6c7
--- /dev/null
+++ b/c
@@ -0,0 +1 @@
+c
7a358aa4a6c6d4cb52cc403c3cc7d3b59c208c70 Initial
The overall process isn't trivial, but once you know the necessary bits of git (stash, revert, checkout, commit, branch) it should be fairly easy to remember that the only steps necessary are to 1) commit, 2) stash save, 3) test, 4) commit fix (optional), 5) revert fix (optional) and 6) stash apply. If you can keep that in your head you know one solution to this problem.

20160430

Revert 2016 edition

Five years ago I wrote about how reverting can speed you up. The main argument is that retrying to go from a new broken state to a new good state can be incredibly hard. The problem is that you did too many steps and you don't know which of the steps actually broke your system.

My experience is that it is hard for people to let go and try a new start. I always see more of a positive edge. You still know all the things you learnt in the process. Making smaller steps will give you predictable progress. Carrying on might lead to nothing. It is high risk.

I'm reverting more on the enjoyable work days. I did something I didn't know yet how it would work out. I learnt something: it doesn't work this way and the problem is a bit more challenging than I though initially. Failures are the most interesting bits of information. I remember my failures way more vividly than the stuff that actually worked. Digesting failures fully gives you lots of information and an opportunity to grow.

I don't like to arrive on a development battle field with corpses all over the place. There are twenty different changes you made from the last good state - all potentially breaking - and now I should find the one line change that broke it all. I want to see how we got from the pristine state of goodness to this. This is the safe approach. Every developer should be able to minimize the breaking change. It's fine to not know the solution for this minimal problem. This is where the investigation starts.

The easiest way to go from a broken state to a good state is to undo all your changes. Go to the known good state and make smaller steps from there.

With git this all became cheaper and safer. The main tools are git add and git stash.

If you're making progress and everything works like it should you can do git add -A between all good states to stage them. Using staged changes is a lightweight way to safe your work. It's the right tool if the change is not big enough that commit is worth it.

In example I'll use bash conditional expressions as a stand-in for a test suite. It tests the presents or absence of a file.

We start with an initial commit that we know works:
$ cd $(mktemp -d)
$ git init
Initialized empty Git repository
$ echo a > a
$ [ -f a ] && echo works || echo broken # run test
works
$ git add -A
$ git commit -m'working'
[master (root-commit) 263f908] working
 1 file changed, 1 insertion(+)
 create mode 100644 a
Now we can change the source, run our test and use add to stage the change.

$ echo -n b > a
$ [ "$(cat a)" == "b" ] && echo works || echo broken #run test
works
$ git add -A
Then we can do another change, find out that it was breaking the test and revert back by checking out the change we staged.

$ echo -n c > a
$ [ "$(cat a)" == "b" ] && echo works || echo broken #run test
broken
$ git checkout a
$ [ "$(cat a)" == "b" ] && echo works || echo broken #run test
works
We can repeat git add until we are ready to commit. This process adds minimum overhead.

Using git stash we make safe points on the way. This will help recover if something went wrong in the process, for example if we recovered to the wrong intermediate state.

The same session with some safe points added.
$ cd $(mktemp -d)
$ git init
$ echo a > a
$ [ -f a ] && echo works || echo broken # run test
$ git add -A
$ git commit -m'working'

$ echo -n b > a
$ [ "$(cat a)" == "b" ] && echo works || echo broken #run test
works
$ git add -A

$ echo -n c > a
$ [ "$(cat a)" == "b" ] && echo works || echo broken #run test
broken
$ git stash
Saved working directory and index state WIP on master: df9056f working
HEAD is now at df9056f working

$ # Oh, no the test was actually wrong!
$ [ "$(cat a)" == "c" ] && echo works || echo broken #run test
broken

$ git stash apply
On branch master
Changes not staged for commit:
  (use "git add file..." to update what will be committed)
  (use "git checkout -- file..." to discard changes in working directory)

 modified:   a

no changes added to commit (use "git add" and/or "git commit -a")

$ [ "$(cat a)" == "c" ] && echo works || echo broken #run test
works
$ git add -A
$ git commit -m"working again"
[master 3faee04] working again
 1 file changed, 1 insertion(+), 1 deletion(-)
I'd advise to not use stash pop. Apply leaves the safe points in case you tried to applied the stash incorrectly. You can always recover from all your safe points starting from the last commit and you will never lose work. They can be safely discarded once you reached a git commit.

A process with more overhead but similar results is to use git add, git commit, git revert and git rebase -i. You use commit regularly between states and you revert bad states. Finally you squash with git rebase -i to have a clean history. Depending on your preference this process or stash is the better choice. Try both and you'll see which one is the appropriate for your situation.

Coming back to the old post about reverting, git stash is the cheapest way to clean a workspace. You can fire away git stash left and right. If you were wrong you can always go back and scavenge the bits of the changes that were important after all. Given these tools the overall process got massively better.