James Shore: Don’t Measure Unit Test Code Coverage

Don’t Measure Unit Test Code Coverage

January 31, 2019

If you’re using test-driven development, don’t measure unit test code coverage. It’s worse than a useless statistic; it will actively lead you astray.

What should you do instead? That depends on what you want to accomplish.

To improve code and test practices

If you’re trying to improve your team’s coding and testing practices, perform root-cause analysis¹ of escaped defects, then improve your design and process to prevent that sort of defect from happening again.

¹As Michael Bolton points out, it should really be root-causes analysis.

If waiting for defects to escape is too risky for you, have experienced QA testers conduct exploratory testing and conduct root-cause analysis on the results. Either way, the idea here is to analyze your defects to learn what to improve. Code coverage won’t tell you.

To improve programmer code quality

If you’re trying to improve programmers’ code quality, teach testing skills, speed up the test loop, refactor more, use evolutionary design, and try pairing or mobbing.

Teaching testing skills and speeding up the test loop makes it easier for programmers to write worthwhile tests. Test coverage doesn’t; it encourages them to write worthless tests to make the number go up.

Refactoring more and using evolutionary design makes your design simpler and easier to understand. This reduces design-related defects.

Pairing and mobbing enhance the self-discipline on your team. Everybody feels lazy once in a while, but when you’re pairing (or mobbing), it’s much less likely that everybody involved will be lazy at the same time. It also makes your code higher quality and easier to understand, because working together allows programmers to see the weaknesses in each other’s code and come up with more elegant solutions.

To improve test discipline

Some people use code coverage metrics as a way of enforcing the habits they want. Unfortunately, habits can’t be enforced, only nurtured. I’m reminded of a place I worked where managers wanted good code commit logs. They configured their tool to enforce a comment on every commit. They most common comment? “a.” They changed the tool to enforce multiple-word comments on every commit. Now the most common comment was “a a a.”

Enforcement doesn’t change minds. Instead, use coaching and discipline-enhancing practices such as pairing or mobbing.

To add tests to legacy code

To build up tests in legacy code, don’t worry about overall progress. The issue with legacy code is that, without tests, it’s hard to change safely. The overall coverage isn’t what matters; what matters is whether you’re safe to change the code you’re working on now.

So instead, nurture a habit of adding tests as part of working on any code. Whenever a bug is fixed, add a test first. Whenever a class is updated, retrofit tests to it first. Very quickly, the 20% of the code your team works on most often will have tests. The other 80% can wait.

To improve requirements code quality

If you’re trying to improve how well your code meets customer needs, involve customer representatives early in the process, like “before the end of the Sprint” early. They won’t always tell you what you’re missing right away, but the sooner and more often you give them the chance to do so, the more likely you are to learn what you need to know.

To improve non-functional quality

If you’re trying to improve “non-functional” qualities such as reliability or performance, use a mix of real-world monitoring, fail-fast code, and specialized testbeds. Non-functional attributes emerge from the system as a whole, so even a codebase with 100% coverage can have problems.

Here’s the thing about TDD

The definition of TDD is that you don’t write code without a failing test, and you do so in a tight loop that covers one branch at a time. So if you’re doing TDD, any code you want to cover is ipso facto covered. If you’re still getting defects, something else is wrong.

If people don’t know how to do TDD properly, code coverage metrics won’t help. If they don’t want to cover their code, code coverage metrics won’t help. If something else is wrong, you got it, code coverage metrics won’t help. They’re a distraction at best, and a metric to be gamed at worst. Figure out what you really want to improve and focus directly on that instead.

PS: Only a Sith deals in absolutes. Ryan Norris has a great story on Twitter about how code coverage helped his team turn around a legacy codebase. Martin Fowler has written about how occasional code coverage reviews are a useful sanity check.

(Thanks to everyone who participated in the lively Twitter debate about this idea.)