My Profile Photo

Sheogorath's Blog

Self-sufficient commits

on

When I look at commit histories I’m often scared. Scared by what happens if people ever move their source code to another platform to collaborate. A well written git history can help to prevent a lot of reverse engineering of your future self or other developers. How? By providing useful information and explanation about a change.

Bad commit messages

It’s an old story. From Updated main.c to changed error message, one liners that tell you as much as the 5 lines of codes of code they changed. Those commit messages are the worst, because they tell you nothing besides information you can get from having a look into the content of the commit.

But there is more:

Merge pull request #123 from example/something-456

changed error message
Updated main.c
fix bug

Well known situation, let me ask you one question: Just from reading these commit messages, can you tell me impact they had on the project and why they were needed?

What makes a commit message a good commit message

A good commit message should be self-sufficient. What does that mean you might ask? It means that a commit message should provide all information you need to understand its existence. In general that means it answers 3 questions:

  • What is the current state of the project? Or: Why do we need this change?
  • What does the change do? Or: What impact has this change?
  • Why is the change as it is? Or: How did I come up with this solution?

That paired with a meaningful “subject line”1 that sums the whole thing up in one short sentence and you have a lot less to worry about.

To round up the message add references at the bottom. Links to documentation, issues2, previous commits, things that you used to come up with your code.

A self-sufficient commit should be atomic

Since a commit is way more than just a commit message, let’s extend the definition of a self-sufficient commit to code as well. The concept of atomic commits is not new. Atomic commits are changesets that can be applied or reverted without resulting in a broken application.3

That means that an atomic commit should contain all assets that are needed for the change. It should add or change the unit and integration tests that are required to keep your Continuous Integration pipeline happy. It should also contain required changes in documentation and the minimal set of code changes required to result in the intended new behaviour.

How self-sufficient commits can help your daily work

Just by the amount of guides to git usage and best practices for commits, you most likely came across parts of the idea behind self-sufficient commits. But at the end of the day, you are programmer who needs to get work done, not writer of novels. So what’s in here for you?

Use git blame. It’s by far one of the most useful tools to understand why a certain line of code exists in a project. And when you use self-sufficient commits you can easily tell what the original idea behind this line is, and maybe if it’s still needed.4

Readable history. When you use git log you can see and understand the history what happened in the past few days even without reading the commits itself. This even applies to the writing or generating change logs from commit histories. Read the subjects and when you are still not sure, read the whole commit message to understand it.

Code reviews. You should review code on a commit-by-commit basis.5 As a reviewer you can compare the intention behind the commit and the code that was actually produced for it. This can help to find unintended side behaviour as well as simply transferring knowledge ranging from “By the way, that’s the purpose of this code” to “Look, this problem can be solved way more efficient this way”.

Self reflection. It might be underestimated, but just by writing a good commit message and verifying that your commits are self-sufficient, you might come up with an improvement of your solution or find out that you forgot something. If you are someone who has problems with self-confidence, self-sufficient commits provide you with a document that ensures you, why this change is needed and why you did it as you did.

Conclusion

Self-sufficient commits sound like a lot of work and writing but they can improve your workflow a lot, especially when you are collaborating. I use them quite often on my infrastructure repository to share knowledge about changes with people following the repository silently as well as sharing them on Mastodon. A good commit message can replace a short blog post.

When it comes to code reviews self-sufficient commits simply crush it, because you have no longer read through hundreds of unrelated comments in a discussion on GitHub and just get the gist as well as something you can compare to the changes the commit makes.

With an always green CI pipeline6 you can also improve the code stability and don’t risk unexpected behaviour due to reverting a commit.

All in all, I’m more than happy with the results I see when people start adopting even small parts of self-sufficient commits because it really helps the overall quality of code and the project itself.7

Important note: As with pretty much everything, self-sufficient commits should not be applied blindly. There are changes, like fixing a typo, where it makes no sense to write a long commit message. But if in question, I recommend to use self-sufficient commit messages rather then not using them, because the minute you invest in writing such a commit message, can safe you an hour of research in two years.

Photo by Shaun Meintjes on Unsplash


Examples for good commit messages:

Resources on writing good commit messages:

Resources on atomic commits:

  1. For those who don’t know it, the first line of a commit is considered an email subject line. 

  2. A lot of git histories are only referencing issues and consider that the “why” part. But have you ever tried to pick a random commit from a random project and figure out why the change is that way by reading a 200 comment issue or Pull Request discussion? It’s horrible and you spend around an hour on that. A commit message could tell you the same thing in less than a minute. And don’t get me started on references that point to a page presenting 404 because projects moved somewhere else. 

  3. Given that the application wasn’t broken before the commit. 

  4. Obviously git blame is not perfect and maybe you have no longer direct access to the commit message due to a major refactor of an application. But it can still safe you some hours of research when it works. 

  5. With code reviews on commit-by-commit basis, it actually doesn’t matter how big the overall a changeset is, as long as the commits themselves are self-sufficient. 

  6. Given you use it correctly. 

  7. It even reduces the vendor lock-in risk, as suddenly issues are no longer an essential part of your code documentation but more like an optional, but useful external reference.