The other day I was talking to a colleague about setting up static analysis properly. I was on my usual soapbox about all the simple steps you can take up front to insure success, and he suggested that I should blog about it, so here it is – How to properly configure your static analysis from the beginning.
This presumes that you’ve already gone through the trouble of selecting the proper tool for the job. And you’ve setup policies and procedures for how and when to use it. What I want to focus on is getting the rule set right.
As I’ve mentioned before there are quite a few ways to have troubles with static analysis, and noise is one of the chief culprits. You
want need to make sure that noise is far out-weighed by valuable information.
Noise comes about primarily from having the wrong rules or running on the wrong code. It sounds simple enough, but what does it mean?
The Wrong Rules
Let’s break the wrong rules down into the various ways they can trouble you. First, noisy rules. If a rule makes a lot of noise or gives actual false positives, you need to turn it off. False positives in pattern-based static analysis are indicative of improperly implemented rules, and should be fixed if possible, turned off it not.
On the other hand, false positives in flow analysis are inevitable and need to weighed against the value they provide. If the time spent chasing the false positives is less than the effort that will be needed to find the bugs otherwise, then it’s probably worth it. If not, turn the rule off.
But there are other kinds of noise. Some rules may technically be correct, but not helpful. For example rules that are highly context dependent. Such rules in the right context are very helpful but elsewhere are far more painful than they are worth. If you have to spend a lot of time evaluating whether a particular violation of a rule should be fixed in this particular case, you should turn the rule off. I’ve seen a lot of static analysis rules in the wild that belong in this category. Nice academic ideas, impractical in the real world.
Another way to recognize a noisy rule is that it produces a lot of violations. If you find that one rule is producing thousands of violations, it’s probably not a good choice. Truthfully, it’s either an inherently bad rule, or it’s just one that doesn’t fit your teams coding style. If you internally agreed with what the rule says to do, you won’t have thousands of violations. Each team has to figure out exactly what the threshold for too many is, but pick a number and turn off rules that are above it, at least in the beginning. You can always revisit it later when you’ve got everything else done.
One area of rules that frequently gets called noise is when developers either don’t understand the value of a rule, or simply disagree with it. In the beginning such rules can undermine your long-term adoption. Turn off rules that are contentious. How can you tell? Run the results past the developers – if they complain, you’ve probably got a problem.
It’s important that the initial rule set is useful and achievable. Later on you will be able to enhance the rules as you come into compliance and as the practice itself becomes ingrained. In the beginning, you need your static analysis tool to deliver high value. I ask myself, would I hold off shipping this code if I found this error? If the answer is yes, I leave the rule on, if not, it’s a poor candidate for an initial rule set. This is what some people call the “Severity 1” rules or high priority rules.
The Wrong Code
Sometimes particular pieces of code are noisy. Typically this ends up being legacy code. The best practice is to let your legacy policy guide you. If you have a rule that legacy code should only be modified when there is a field reported bug, then you don’t want to run static analysis on it. If you have a policy that when a file is touched it should come into compliance with static analysis rules, then use that. Otherwise skip it. Don’t bother testing any code that you don’t plan on fixing.
Checking the configuration
Once you have a set of rules you think is right, you should validate that. Run on a piece of real code and look at the number and quality of violations coming out. If the violations are questionable, or there are too many, go back to square one and weed them out. If they’re reasonable, pick another project, preferably from a different team or area, and run the test again. You’ll likely find a rule or two that you need to turn off when you do this. After that, you’ve got a pretty reasonable chance of having a good initial rule set.
In the long term, you should be adding to your rules. You want to resist the temptation to get all the rules in the initial set, because you’ll get run over with too much work. Plus as developers get used to having a static analysis tool advise them, they start improving the quality of the code the produce, so it’s natural to ratchet it up from time-to-time.
When you see that a project or team has either gotten close to 100% compliance with your rules, or if they’ve essentially plateaued, then you want to think about adding rules. The best choice is not to simply add rules that sound good, but rather choose rules that have a relationship to problems you’re actually experiencing. If you have performance issue, look at that category for rules to add, and so on.
Spending the time upfront to make sure your rules are quiet and useful will go a long way to insuring your long term success with static analysis.
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else.