In ongoing discussions here at the blog and elsewhere, I keep seeing the topic of false positives in static analysis come up. It’s certainly a very important issue when dealing with static analysis, but the strange thing is that people have very different opinions of what a false positive is, and therefore different opinions of what static analysis can do and how to properly use it.
In the simplest sense, a false positives means that the message that a rule was violated is incorrect, or in other words the rule was not violated – the message was false. In other words, a false positives should mean that static analysis said it found a pattern in your code but the pattern doesn’t actually exist in your code when you review it. Now, that would be a real false positive.
Pattern-based false positives
False positives and “not false positives” (false negatives) are in two different areas. One is pattern based static analysis, which also includes metrics. There is also flow-based static analysis. One thing to remember is that pattern based static analysis doesn’t typically have false positives. If it has a false positive, it’s really a bug in the rule or pattern definition, because the rule should not be ambiguous. If the rule doesn’t have a clear pattern to look for, it’s a bad rule.
This doesn’t mean that when a rule lists a violation that there is necessarily a bug, which is important to note and the source of much of the confusion. A violation simply means that the pattern was found. You have to look and say I am choosing these patterns and flagging these patterns because they are dangerous to my code. When I look at pattern, I ask myself, does this apply to my code, or doesn’t it apply? If it applies, I fix the code, if it doesn’t apply I suppress it.
It’s best to suppress it in the code itself rather than an external location such as your UI or a configuration file, so that it’s visible and others can look at it, and you won’t end up having to review it a second time. It’s a bad idea to not suppress the violation explicitly, because then you will constantly be reviewing the same violation. The beauty of in-code suppression is that it’s independent of the engine. Anyone can look at the code and see that the code has been reviewed and that this pattern has been deemed acceptable in this code.
This is the nature of pattern-based static analysis. It’s based on an understanding that certain things are bad ideas and may not be safe. This doesn’t mean you cannot do them in a particular context, but that such things should be done carefully.
Flow Analysis false positives
In flow analysis you have to address false positives because it will always have false positives. Flow analysis cannot avoid false positives, for the same reason unit testing cannot generate perfect unit test cases. When you have code that uses some kind of library, for instance your java code calls the OS and something come back, who knows what it’s sending? It’s going to be a false positive because we have to make assumptions about what’s coming back.
We try to err on the side of caution. Be careful here, if it’s possible that something strange might be returned protect against this. This finds bugs, that’s why it’s so valuable. You also need to understand the power and weakness of flow analysis. The power of flow analysis is that it goes through the code and tries to find hot spots and find problems around the hot spots.
The weakness is that it is going some number of steps around the code it’s testing, like a star pattern. The problem is that if you start thinking you’ve cleaned all the code because your flow analysis is clean, you are fooling yourself. Really, you’ve found some errors and you should be grateful for that.
The real question with flow analysis is the amount of time you spend going through results to find false positives, suppress them, and fix real bugs quickly before it goes into functional testing where it would be more difficult to find with debugging.
Lets say you spend an hour to fix and suppress a dozen flow analysis items at something like a 50% false positive ratio , which is pretty nasty. Now lets say one or two of these real bugs leaks into field, by the time you get info back from the field report to support and development, it may cost a half-day or even 2-3 days of time to address the issue. It is your decision which way is more time saving and effective.
In addition to flow analysis you should really think about using runtime error detection. Runtime error detection allows you to find much more complicated problems than flow analysis can find. Runtime error detection doesn’t have false positives, because the code is executed with a known value and had an actual failure.
The key to success is to choose which rules you want to adhere to, and then get the code clean progressively. Which means start with code you’re currently modifying and extend it throughout the code base until you are done. At some point when you see that there are very few violations, you can run it on the whole code base, rather than just recently changed code. In other words, set a small initial rule set with a cutoff-date of “today”. Then when you see violations dying out, add new rules, run it on the whole code, and review – we’ll discuss how to do the review in a moment. But we recommend extending the cutoff-date backward before adding new rules, because your initial rule set is only things that you feel are critical.
Rules/configurations should really be based on real issues you’re facing. IE if you take feedback from QA, code review, field reported bugs, etc. and then find static analysis rules to address those problems.
Sometimes developers fall into the trap of labeling any error message they don’t like as a false positive, but this isn’t really correct. They may label it a false positive because they simply don’t agree with the rule, they may label it because they don’t understand how it applies in this situation, or they may label it because they don’t think it’s important in general or in this particular case. The best way to deal with this head-on is to make sure that the initial rule set you start with has a small number of rules that everyone can agree on. It should product reasonable results that can be dealt with in a reasonable amount of time.