Enable empirical research based on large test sets
Encourage improvement of tools
Speed adoption of tools by objectively demonstrating their use on real software
I find SATE interesting because it takes a couple of different approaches that are pretty useful to people trying to understand what static analysiscan and cannot due. One approach is to have several full-fledged applications with known bugs, and versions of the application with those bugs fixed. These have the effect of showing what static analysis tools can do in the real world. Unfortunately, they don’t help much when trying to find out what kinds of issues static analysiscan handle overall.
To do that, NIST has developed a test suite that has thousands of test cases with specific issues in them. Part of SATE is running various tools on the test applications and test suites, and then trying to analyze what they can find, how much noise they produce, etc. It’s an interesting exercise. You should check it out.
This year I was privileged give a presentation myself. I wanted to talk about some of the pragmatic aspects of actually trying to use static analysis in the real world. To that end, I created a slide show around the top 10 user mistakes, meaning things that prevent organizations from realizing the value they expected or needed from static analysis. These range from improper configuration to poor policy to dealing with noise.
Take a look for yourself. If you love or hate any of them, let me know. If you have others I missed, feel free to mention it in the comments, or email me or reach me on twitter.
I’ll be speaking this Thursday at the SATE IV software security conference in McLean, VA. This is a free event open to the public and a great chance to learn more about static analysis at a day-long event. You can register at http://samate.nist.gov/SATE4Workshop.html
My talk is title “Top 10 User Mistakes with Static Analysis” and I think you’ll enjoy it. If possible I’ll post the slides here after the conference.
Parasoft is doing a survey about Dev and QA Test Environment Access. If you want a chance to share your views, now is a great time. Plus there is a drawing for one of three $100 Amazon gift cards or one of ten $10 Starbucks gift cards. All respondents will be offered free access to the survey results.
The other day I was talking to a colleague about setting up static analysis properly. I was on my usual soapbox about all the simple steps you can take up front to insure success, and he suggested that I should blog about it, so here it is – How to properly configure your static analysis from the beginning.
This presumes that you’ve already gone through the trouble of selecting the proper tool for the job. And you’ve setup policies and procedures for how and when to use it. What I want to focus on is getting the rule set right.
As I’ve mentioned before there are quite a few ways to have troubles with static analysis, and noise is one of the chief culprits. You want need to make sure that noise is far out-weighed by valuable information.
Noise comes about primarily from having the wrong rules or running on the wrong code. It sounds simple enough, but what does it mean?
The Wrong Rules
Let’s break the wrong rules down into the various ways they can trouble you. First, noisy rules. If a rule makes a lot of noise or gives actual false positives, you need to turn it off. False positives in pattern-based static analysis are indicative of improperly implemented rules, and should be fixed if possible, turned off it not.
On the other hand, false positives in flow analysis are inevitable and need to weighed against the value they provide. If the time spent chasing the false positives is less than the effort that will be needed to find the bugs otherwise, then it’s probably worth it. If not, turn the rule off.
But there are other kinds of noise. Some rules may technically be correct, but not helpful. For example rules that are highly context dependent. Such rules in the right context are very helpful but elsewhere are far more painful than they are worth. If you have to spend a lot of time evaluating whether a particular violation of a rule should be fixed in this particular case, you should turn the rule off. I’ve seen a lot of static analysis rules in the wild that belong in this category. Nice academic ideas, impractical in the real world.
Another way to recognize a noisy rule is that it produces a lot of violations. If you find that one rule is producing thousands of violations, it’s probably not a good choice. Truthfully, it’s either an inherently bad rule, or it’s just one that doesn’t fit your teams coding style. If you internally agreed with what the rule says to do, you won’t have thousands of violations. Each team has to figure out exactly what the threshold for too many is, but pick a number and turn off rules that are above it, at least in the beginning. You can always revisit it later when you’ve got everything else done.
One area of rules that frequently gets called noise is when developers either don’t understand the value of a rule, or simply disagree with it. In the beginning such rules can undermine your long-term adoption. Turn off rules that are contentious. How can you tell? Run the results past the developers – if they complain, you’ve probably got a problem.
It’s important that the initial rule set is useful and achievable. Later on you will be able to enhance the rules as you come into compliance and as the practice itself becomes ingrained. In the beginning, you need your static analysis tool to deliver high value. I ask myself, would I hold off shipping this code if I found this error? If the answer is yes, I leave the rule on, if not, it’s a poor candidate for an initial rule set. This is what some people call the “Severity 1” rules or high priority rules.
The Wrong Code
Sometimes particular pieces of code are noisy. Typically this ends up being legacy code. The best practice is to let your legacy policy guide you. If you have a rule that legacy code should only be modified when there is a field reported bug, then you don’t want to run static analysis on it. If you have a policy that when a file is touched it should come into compliance with static analysis rules, then use that. Otherwise skip it. Don’t bother testing any code that you don’t plan on fixing.
Checking the configuration
Once you have a set of rules you think is right, you should validate that. Run on a piece of real code and look at the number and quality of violations coming out. If the violations are questionable, or there are too many, go back to square one and weed them out. If they’re reasonable, pick another project, preferably from a different team or area, and run the test again. You’ll likely find a rule or two that you need to turn off when you do this. After that, you’ve got a pretty reasonable chance of having a good initial rule set.
In the long term, you should be adding to your rules. You want to resist the temptation to get all the rules in the initial set, because you’ll get run over with too much work. Plus as developers get used to having a static analysis tool advise them, they start improving the quality of the code the produce, so it’s natural to ratchet it up from time-to-time.
When you see that a project or team has either gotten close to 100% compliance with your rules, or if they’ve essentially plateaued, then you want to think about adding rules. The best choice is not to simply add rules that sound good, but rather choose rules that have a relationship to problems you’re actually experiencing. If you have performance issue, look at that category for rules to add, and so on.
Spending the time upfront to make sure your rules are quiet and useful will go a long way to insuring your long term success with static analysis.
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else.