Monday, December 16, 2019

100% Coverage

I'm doing some consulting for a company that has a policy of 100% branch coverage on all their tests. I have mixed feelings about this. Certainly for some of the software they develop, avionics e.g., software bugs can get people killed and I can see that 100% branch coverage helps assure that this is unlikely (but it doesn't guarantee it, unfortunately. Just because you've tested all the branches doesn't mean the software is correct. But it goes a long way to making sure your software behaves predictably.)

Fortunately, I don’t work on software that people's lives depend upon. I'm working on supply-chain and inventory software and here the issue is murkier. Sure, you can argue that “For want of a nail...”, but realistically, a bug in the supply-chain software is simply an annoyance that means wasted space if you have too many of something or a delay in production if you have too few. And there are already many other sources of production delay. Or perhaps you'll see an ugly display if the bug is in the UI. These are annoying, but hardly life threatening.

There is a significant downside to 100% branch coverage. It means that you have to persuade the code to take branches it wouldn't normally take. Sometimes this is as easy as passing out of range values to a procedure, but sometimes it is hard. Code isn’t designed to fail, it is designed to work, so sometimes you have to induce failure to take the non-normal branch. It shouldn’t be easy to induce failure. It should be close to impossible. That makes testing the failure cases close to impossible as well. You have two equally unpleasant alternatives: either adjust the code to make it easier to induce failure, or remove the branch that handles the “impossible” case.

I am a proponent of “defensive” programming. I put “sanity checks” in my code to ensure that the system is in an expected state. It may not be possible for it to get into an unexpected state, but I put the checks in anyway. I add default clauses even if all the cases are covered. My conditional branches often have a clause for the “can’t happen” case. I add sentinel values to enums that they should never be able to take on. These situations either fail-fast or drop you into the debugger depending on whether you have a debugger or not. This kind of defensive programming introduces branches that should not be possible to take, so it is not possible to get 100% branch coverage. If you want 100% branch coverage, you are left with the unpleasant options of hacking your code so it can fail sanity checks, enter states not covered by all cases, do things that “can’t happen” or take on illegal values, or, alternatively, removing the sanity checks so all branches can be taken.

Code can change, and these defensive techniques help ensure that the code detects breaking changes and acts appropriately by dropping into the debugger or failing-fast.

Of course, if you are flying an airplane, you don't want your software to unexpectedly drop into the debugger and the airplane to unexpectedly drop out of the sky. In this case, a policy of 100% branch coverage makes sense. For business software, it is harder to make that case.

Addendum: Stupid tools.

Java provides a nice set of stupid tools. One prime example is the branch coverage tool. It tells you which conditionals in the code are fully tested or only partially tested. What it doesn't tell you is which branch was tested. So you have to guess. Sometimes it is fairly obvious, but other times, especially with slightly more complex conditionals, it is very difficult to figure out which branches are being taken and which are not. Now there are some reasons why the coverage tool cannot determine this basic information, and I'm sure they are logical (I understand it has something to do with it being difficult to map the optimized byte code back to the source code), but seriously, a tool that can detect coverage but cannot tell you what exactly is covered is only half baked. This encourages people to rewrite their conditionals to avoid anything more complex than single tests, which makes the code more verbose and less readable. Thanks a bunch.

No comments: