Some time ago, I’ve been reviewing a pull request and came across two sequential commits like this
Implementing uploader
Implementing uploader
I’ve seen this before, but this time my reaction was like “WTF? Did the first attempt fail to work?”. Frankly, I’ve seen this a lot on projects using SVN, but with git there is no excuse not to combine those together. Since then, two question were bothering me: is this a common problem on other projects and what actually leads to it?
To find answer for the first question, I wrote a small script that goes through commit history and analyses commit messages. A quick first overview of a few real projects has shown that such situation is quite common and a few more common bad patterns came up. These bad patterns might be called as bad smells if you prefer refactoring terminology. I’m going to cover just a few most common and obvious issues.
Repeating commit messages
The first one, that was already mentioned, is same or similar messages for sequential commits. Probably everyone saw sequential commits like several fix
es in a row. Or it might have even more worse form, when one feature implemented in several commits.
Implementing feature
Implementing feature
Implementing feature
Implementing feature or refactoring in multiple commits is absolutely fine, however those commits must be meaningful, each commit must represent a useful part of a feature. And it’s ok when developer wants to save certain version in order to go back and revert it later, in such case there is no need to write long description for each experimental change. But why all those commits go into master (or other main branch)? Seems that is an open question. Such bad practices scatter knowledge about feature/bugfix across multiple commits, and it would be really hard to figure out what happened, when someone looks into blame or history log.
Same story happens with bugfixing, commit log might look like
Try to fix JS error
Try to fix JS error
Try to fix JS error
You may laugh, but it really happens on real projects. And usually such commits sequences are signs of debugging on a live production servers, when one tries to fix a bug, then commits it and deploys to production for testing, and if fix fails that whole iteration is repeated one more time.
Obviously debugging on production is bad. Moreover this smell also might indicate programming by coincidence and lack of tests.
We all know DRY principle, why do we allow repeating of commits?
Another bad example, that is not duplicates, but still doesn’t make much sense, the only explanation here might be that one is trying to increase contributions count on github.
Update rspec-support to version 3.0.1
Update rspec-expectations to version 3.0.1
Update rspec-mocks to version 3.0.1
Update rspec-core to version 3.0.1
Short commit messages
Speaking of case
fix
fix
There is also an another issue — short commit message. Saying just fix
does not provide much information about what happens there. Short message is actually an instance of a broader problem — undescriptive message. Short messages confuse even more than duplicates, when someone finds it in a file’s blame. Commit message should reveal intention behind changes in this commit. This relates to the previous issue, changeset must deserve to exist as a single commit entry in a repository, commit should introduce a small independent feature, bugfix, refactoring, etc. And commit message should tell a story what’s happend, why one introduced this change, what value this commit adds to a project.
Here is a list of short messages from real projects, can you guess what happens behind them?
!
tmp
wip
oops
dummy
So here is a game for you, pick randomly a few short commit messages in your project and try to guess what they do, and then check with actual diff from this commit.
Probably everyone heard that programs must be written for people, and not for compilers. So let’s write commit messages for people too. There is only one short message that makes sense, it’s fix typo
, and only if commit’s diff really fixes a typo in a code, documentation, comment or any other text in your code.
Merge commits
Another issue is merge commits. Merge commits lead to a nonlinear commit history, and the more branching level you have the more puzzling commit history becomes. Of course branching is one of the most coolest features in git. The ease with wich you can create many branches and manipulate them is awesome. But same time git has powerful features to rewrite commits, like rebase, that you can use to maintain clean commit history.
For instance, it’s quite common when someone implements a big feature, and from time to time, merges master into feature branch, and then this all is merged back into master. But what if there are multiple features in development? Or if this combines with other issues, like duplicate messages, you can easily end up with such structure
* 0d6430d Add functionality to export
* cb8a7f1 Add functionality to export
* 8b9c3e4 Merge remote-tracking branch 'origin/master' into my_export
|\
| * 5c4d1f9 some ui fixes
* | 715857e Add functionality to export
* | bb81445 Add functionality to export
* | 1a4e59d Add functionality to export
* | de1eb9a Merge remote-tracking branch 'origin/other_export' into my_export
|\ \
| * | 7a1ebbd hack
| * | 41e32ad merge with other fixes
| * | eacd9c2 merge with other fixes
| |\ \
| | * | 91a7cb4 Super hot fixes
| | * | 82b9cba Merge remote-tracking branch 'origin/master'
| | * | 1a5c1cd bundler fixes
| | /
| * | 2aab419 fixes
| * | 5b0317d Merge remote-tracking branch 'origin/my_export' into other_export
| |\ \
| * | | 4bdeb10 refactoring
| * | | 1e7f673 Merge remote-tracking branch 'origin/my_export' into other_export
| |\ \ \
| * | | | 4abeb12 Merge branch 'cool_export' into other_export
* | | | | d15457f Add functionality to export
| |_|/ /
|/| | |
* | | | 9a4ebc7 Add functionality to export
* | | | 271d1b0 Add functionality to export
| |/ /
|/| |
* | | d1dc4a0 Add functionality to export
* | | e9b66aa Merge remote-tracking branch 'origin/cool_export' into my_export
|\ \ \
| |/ /
| * | b12f64c layout fixes
| /
* | 17a8452 Add functionality to export
* | 5d5ada3 Add functionality to export
* | 87d6ff0 Merge remote-tracking branch 'origin/master' into my_export
|\ \
| |/
| * df5fcda quick fix
* 3ab15a2 Add functionality to export
Note: any similarity to actual repositories is purely coincidental.
Ok, that’s not always an issue. It depends on a project and workflow you use. But on a medium size project with 5-10 developers it’s possible to maintain a near linear commit history.
Final Thoughts
As I mentioned earlier, for the analyis I wrote a simple script, but later it evolved into a small CLI tool to analyze repositpries: repokeeper. Give it a try and check some of your projects. Do you you have any of the above issues in you projects? Or what are other issues you dealt with? I’d like to hear your stories, since I mostly touched common commit level issues here, though there many other smells of not well maintained repositories.