Git: retroactively keeping tests green

I love using git bisect and git bisect run to know when things broke, but that only works if the commit history is generally clean - ie there are aren't commits with broken tests.

The basic idea, given a failing test, and a test command (for the sake of illustration let's call it just pytest) is this:

git bisect start
git bisect bad  # Mark the current commit as bad
git bisect good $COMMIT_HASH  # Mark an earlier, known good commit

git bisect run pytest

That will automatically run until it finds the first commit where pytest fails. It's a beautiful tool.

It's often desirable to run your test command with more specific args to narrow down to the problem you care about. Or, if it's a newly discovered problem, write a new failing test in a file that isn't added to the repo - that way you can test it against any commit.

Recently on a personal project, I realized I had gotten out of the habit of running the test suite at all¹ before committing. Predictably, several tests were failing. I wanted to fix this, but as long as I was of a mind to be cleaning up and fixing, I didn't like the idea of committing test fixes far removed from the original commit that broke them.

Why? Because if I left those commits bad, then I wouldn't be able to usefully run git bisect run on that range of commits in the future, since they would all be bad.

Today's happy realization was that by combining several familiar git features, this was easy enough to address that it didn't feel like more trouble than it was worth. It was surprisingly smooth.

This assumes you're already familiar with git rebase.

TL;DR the recipe

Use git bisect run to find the commit that broke the tests, as above. Let's call this $FIRST_BAD_COMMIT.
Check out HEAD of your branch again and fix the code or test code as needed. (If this involves adding a new test case, do that now.)
Use git commit --fixup $FIRST_BAD_COMMIT to create a fixup commit. This will be added to the end of your commit history, as usual.
git rebase -i --autosquash $FIRST_BAD_COMMIT~1 This will move the changes from the fixup commit to just after the bad commit, and squash them together, and rewrite all the following commits to be based on those changes.

Result: We have rewritten history so that the test never failed.

Individually, those are all familiar git commands. But I hadn't ever combined fixup and bisect run before.

When and why to do this?

Probably rarely!

In a personal project like this, I wouldn't have bothered - before I realized how easy it was to do!

If it went south - as history rewriting sometimes can - I probably would have just left the fixes where they originally landed.

What about at work?

In any of my usual at-work workflows, this trick is very unlikely to apply. I'll be working on a branch that will end up as one or a few commits that I'm merging anyway, and all of them will be green.

In a shared repo, we shouldn't rewrite commit history that anybody else depends on. It's extremely disruptive. Doing it on eg a feature branch or chain of branches that a couple of colleagues are contributing may be fine, but usually takes some direct communication to coordinate ("Hey I'm rebasing the branch again").

In a work context, I most liberally rewrite my own history before merging it to the main branch; never after. Especially prior to asking for code review. If I'm doing something tricky, I want my commit history to be logical and easy to follow - a help to reviewers. Commits like "try to fix the broken thing yet again" are just distracting and confusing.

Sometimes - depending on team culture - I'll also do some commit history cleanup after code review and before merge, because nobody in the future will need to see 3 separate commits like "address review feedback round 3".

Some teams enforce that all branch work gets squashed to a single commit before merging to main, in which case it's moot.

Some teams let folks merge as many commits as they feel like, which is fine; and only require the final one to be green, which is not fine - precisely because it introduces noise when trying to use git bisect run. I'll argue against that practice when I see it.

It was surprising to realize I had stopped running the tests. This project began life as a quick hack script, and I've generally only done the absolute minimum work required to add the next feature I needed. It's only recently that I wrote any tests for it at all, because I wanted some features that forced me to finally clean up and refactor the darn thing. At work, I would never think of ignoring the test suite. Why is this project different? Is it because of the long history of it getting so little attention? So working on this project automatically puts me into a mindset of "this code used to be crap, therefore it always will be; it never had tests, therefore it doesn't now"? ↩

slinkp blog