What Went Wrong with Static Analysis

Let me say up front that I’m a big fan of static code analysis. I’m one of the people that believe that when static analysis is used properly the benefits are tremendous. That being said, I’m constantly surprised at how many people aren’t getting the value they expected, leading me to wonder what went wrong.

There are a few things that seems to cause people to stumble. They range from tools that are too difficult or impractical to use in a real production environment, too many false positives, too much “noise”, no developer buy-in, no management buy-in, and improper expectations.

Difficult / Impractical
I could go on for days talking about the various ways in which a tool can be impractical. Having seen numerous company attempts to adopt new technologies and processes over the years, I can attest to the fact that employees can successfully kill any such projects… and rightly so. Frequently management becomes enamored over the latest idea and tries to implement it without thinking about the real cost and possible disruption to existing business practices. If the new process/tool doesn’t provide more value than the existing process, it will be killed by the users, who at best will perform willful non-compliance, and at worst will actively work to make sure the project/tool/process fails.

Any disruptive changes need to be carefully weighed for value and cost. If it’s believed that they’re worth the disruption, then management must be prepared to fully back them up. This means mandates for compliance, time allowed and scheduled for training, changeover, etc. Without this, most attempt to improve tooling and process company-wide end up being short-lived diversions to the status quo. In the near future I’ll go into more detail on how to successfully select tools and processes.

False positives / noise
False positives and noise are among the biggest reasons why developers start using static analysis and then don’t continue. With false positives there is initially a perception problem. Developers often think of any static analysis message they don’t agree with as a false positive. I’d have to say that a false positive is more properly defined as a message that is actually incorrect. For example, it says I have an unclosed resource such as a JDBC connection, when in reality the connection is closed. The difference between tools in the area of false positives is extremely important. Some tools do a poor job of getting it right, and some rules the some vendors choose to create are simply poor choices. Sometimes there are patterns that static analysis looks for that are particularly prone to false positives – such tools and rules should be avoided like the plague.

Frequently I see developers say things like “I don’t agree with that” or “I don’t understand why it’s saying that” etc. These are not necessarily false positives, but more likely a training or configuration issue. This is why I mention both noise and false positives together, even though they are not necessarily the same thing. In my mind, false positives are one particular kind of noise.

Noise I like to define much more broadly, as anything that a developers doesn’t want to see. This is frequently the claim of false positives, whether correct or not. It can also be a rule problem, in that some rules are very context sensitive and only apply in certain areas. It can be a poorly configured tool, for example one that doesn’t understand the difference between legacy code and current code. It’s not uncommon for teams to have a policy that you don’t touch legacy code simply to fix static analysis issues. If your tool doesn’t accommodate this policy, it’s going to be a problem.

Rule choice is another area that produces unwanted noise. As mentioned above, some rules are poorly implemented in some tools, or even simply a bad idea. This is way more common than it should be. Any rule that produces a constant amount of noise should not be used. By constant amount I don’t mean I’d accept a rule that gave me the “right” answer 51% of the time, I mean one that gives me the right answer 90+% of the time. It’s all too common for people to turn on rules because it sounds like a good idea, but in actual production, the rule requires way to much manual evaluation.

In some sense it’s similar to a spell checker. If you have an untrained spell checker working against a document, it will likely show real spelling errors buried in a mountain of noise based on words unique to your company, industry, or product. If you choose to scan through to manually find the real misspelled words, you’ll very likely miss some. If instead you take the time to add new words to your dictionary, then the spell checker will be right a very high percentage of the time. This means you’re less likely to miss errors, and more likely to use the tool.

Again, tools that are noisy, bad rules, and tools that cannot be configured for real production code should be avoided. It’s extremely unlikely that you can sustain their use in the long-term.

Buy-in
Buy-in seems like such a simple concept but it’s really much more. I’ve worked with organizations that felt like static analysis should be like sugar – the developers will just want to use it. It’s important to understand that static analysis is a quality process which will cause extra work up front for developers. In some cases it may even impact deadlines. The theory is that the up front costs pay off in higher quality, less debugging, shortened testing cycles, etc.

It’s unrealistic to think that you can add a step to your development process that will not have some kind of impact on work effort and schedule. Yes, you should expect a long-term return on investment for the cost and effort of using static analysis, but not on day one, or week one, or even month one.

In order for static analysis to “stick” as a process, you need to make sure that management understands what it’s for, and is willing and able to enforce it’s use. This of course presupposes that the tool is a good one, with low noise and well configured. This minimizes the negative impact and improves developer buy-in. If developers get messages that aren’t useful, they’re likely to ignore the results. If on the other hand static analysis saves their neck by telling them something important, then you’re on the right path.

Proper expectations
So what should you expect from static analysis? Is it a golden bullet? Will simply applying any static analysis tool with any rule set do the job? Is just buying the tool enough – developers will be happy to use it? The answer to all the above is obviously no.

The best possible way to make sure that you get ROI on your static analysis is to tie the configuration to problems you’re actually having. Surprisingly most teams I see are using SA because its the “right thing to do” without any regard to problems they’re actually experiencing.

It is much more effective to do some kind of postmortem on your existing code and projects. Then based on those results configure your rules based on problems you’re actually having. For example, if you have a server that’s crashing or hanging randomly, turning on rules that check for unhandled exceptions and resources that aren’t closed are likely to be effective. Rules that check for the placement of curly brackets are not.

Conclusions
Static analysis can be a huge benefit with minimal impact when deployed properly. Organizations that look at it as a long-term investment in quality and configure it for actual problems are likely to get the best benefit and most likely to see it’s use continue over time. Groups that simply do it because it right, without an understanding of risk, investment, and proper configuration end up losing. At best, they produce unmeasurable results that they cannot justify, at worst they can actually decrease quality by wasting time and focus on unimportant items, while taking resources from real important issues. Make sure this doesn’t happen to you.

[Disclaimer]
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else.
[/Disclaimer]

Resources

What Went Wrong with Static Analysis?

Leave a Reply Cancel reply