Tag Archives: static analysis

Getting the Static out of your Analysis

Day 11 / 365 - Touching static © by xJason.Rogersx
The other day I was talking to a colleague about setting up static analysis properly. I was on my usual soapbox about all the simple steps you can take up front to insure success, and he suggested that I should blog about it, so here it is – How to properly configure your static analysis from the beginning.

This presumes that you’ve already gone through the trouble of selecting the proper tool for the job. And you’ve setup policies and procedures for how and when to use it. What I want to focus on is getting the rule set right.

As I’ve mentioned before there are quite a few ways to have troubles with static analysis, and noise is one of the chief culprits. You want need to make sure that noise is far out-weighed by valuable information.

Noise comes about primarily from having the wrong rules or running on the wrong code. It sounds simple enough, but what does it mean?

The Wrong Rules

Let’s break the wrong rules down into the various ways they can trouble you. First, noisy rules. If a rule makes a lot of noise or gives actual false positives, you need to turn it off. False positives in pattern-based static analysis are indicative of improperly implemented rules, and should be fixed if possible, turned off it not.

On the other hand, false positives in flow analysis are inevitable and need to weighed against the value they provide. If the time spent chasing the false positives is less than the effort that will be needed to find the bugs otherwise, then it’s probably worth it. If not, turn the rule off.

But there are other kinds of noise. Some rules may technically be correct, but not helpful. For example rules that are highly context dependent. Such rules in the right context are very helpful but elsewhere are far more painful than they are worth. If you have to spend a lot of time evaluating whether a particular violation of a rule should be fixed in this particular case, you should turn the rule off. I’ve seen a lot of static analysis rules in the wild that belong in this category. Nice academic ideas, impractical in the real world.

Another way to recognize a noisy rule is that it produces a lot of violations. If you find that one rule is producing thousands of violations, it’s probably not a good choice. Truthfully, it’s either an inherently bad rule, or it’s just one that doesn’t fit your teams coding style. If you internally agreed with what the rule says to do, you won’t have thousands of violations. Each team has to figure out exactly what the threshold for too many is, but pick a number and turn off rules that are above it, at least in the beginning. You can always revisit it later when you’ve got everything else done.

One area of rules that frequently gets called noise is when developers either don’t understand the value of a rule, or simply disagree with it. In the beginning such rules can undermine your long-term adoption. Turn off rules that are contentious. How can you tell? Run the results past the developers – if they complain, you’ve probably got a problem.

It’s important that the initial rule set is useful and achievable. Later on you will be able to enhance the rules as you come into compliance and as the practice itself becomes ingrained. In the beginning, you need your static analysis tool to deliver high value. I ask myself, would I hold off shipping this code if I found this error? If the answer is yes, I leave the rule on, if not, it’s a poor candidate for an initial rule set. This is what some people call the “Severity 1” rules or high priority rules.

The Wrong Code

Sometimes particular pieces of code are noisy. Typically this ends up being legacy code. The best practice is to let your legacy policy guide you. If you have a rule that legacy code should only be modified when there is a field reported bug, then you don’t want to run static analysis on it. If you have a policy that when a file is touched it should come into compliance with static analysis rules, then use that. Otherwise skip it. Don’t bother testing any code that you don’t plan on fixing.

Checking the configuration

Once you have a set of rules you think is right, you should validate that. Run on a piece of real code and look at the number and quality of violations coming out. If the violations are questionable, or there are too many, go back to square one and weed them out. If they’re reasonable, pick another project, preferably from a different team or area, and run the test again. You’ll likely find a rule or two that you need to turn off when you do this. After that, you’ve got a pretty reasonable chance of having a good initial rule set.

Long term

In the long term, you should be adding to your rules. You want to resist the temptation to get all the rules in the initial set, because you’ll get run over with too much work. Plus as developers get used to having a static analysis tool advise them, they start improving the quality of the code the produce, so it’s natural to ratchet it up from time-to-time.

When you see that a project or team has either gotten close to 100% compliance with your rules, or if they’ve essentially plateaued, then you want to think about adding rules. The best choice is not to simply add rules that sound good, but rather choose rules that have a relationship to problems you’re actually experiencing. If you have performance issue, look at that category for rules to add, and so on.

Spending the time upfront to make sure your rules are quiet and useful will go a long way to insuring your long term success with static analysis.

[Disclaimer]
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else.
[/Disclaimer]

Resources

False Positives and Other Misconceptions in Static Analysis

wrong answer © by Sean MacEntee
In ongoing discussions here at the blog and elsewhere, I keep seeing the topic of false positives in static analysis come up. It’s certainly a very important issue when dealing with static analysis, but the strange thing is that people have very different opinions of what a false positive is, and therefore different opinions of what static analysis can do and how to properly use it.

In the simplest sense, a false positives means that the message that a rule was violated is incorrect, or in other words the rule was not violated – the message was false. In other words, a false positives should mean that static analysis said it found a pattern in your code but the pattern doesn’t actually exist in your code when you review it. Now, that would be a real false positive.

Pattern-based false positives

False positives and “not false positives” (false negatives) are in two different areas. One is pattern based static analysis, which also includes metrics. There is also flow-based static analysis. One thing to remember is that pattern based static analysis doesn’t typically have false positives. If it has a false positive, it’s really a bug in the rule or pattern definition, because the rule should not be ambiguous. If the rule doesn’t have a clear pattern to look for, it’s a bad rule.

This doesn’t mean that when a rule lists a violation that there is necessarily a bug, which is important to note and the source of much of the confusion. A violation simply means that the pattern was found. You have to look and say I am choosing these patterns and flagging these patterns because they are dangerous to my code. When I look at pattern, I ask myself, does this apply to my code, or doesn’t it apply? If it applies, I fix the code, if it doesn’t apply I suppress it.

It’s best to suppress it in the code itself rather than an external location such as your UI or a configuration file, so that it’s visible and others can look at it, and you won’t end up having to review it a second time. It’s a bad idea to not suppress the violation explicitly, because then you will constantly be reviewing the same violation. The beauty of in-code suppression is that it’s independent of the engine. Anyone can look at the code and see that the code has been reviewed and that this pattern has been deemed acceptable in this code.

This is the nature of pattern-based static analysis. It’s based on an understanding that certain things are bad ideas and may not be safe. This doesn’t mean you cannot do them in a particular context, but that such things should be done carefully.

Flow Analysis false positives

In flow analysis you have to address false positives because it will always have false positives. Flow analysis cannot avoid false positives, for the same reason unit testing cannot generate perfect unit test cases. When you have code that uses some kind of library, for instance your java code calls the OS and something come back, who knows what it’s sending? It’s going to be a false positive because we have to make assumptions about what’s coming back.

We try to err on the side of caution. Be careful here, if it’s possible that something strange might be returned protect against this. This finds bugs, that’s why it’s so valuable. You also need to understand the power and weakness of flow analysis. The power of flow analysis is that it goes through the code and tries to find hot spots and find problems around the hot spots.

The weakness is that it is going some number of steps around the code it’s testing, like a star pattern. The problem is that if you start thinking you’ve cleaned all the code because your flow analysis is clean, you are fooling yourself. Really, you’ve found some errors and you should be grateful for that.

The real question with flow analysis is the amount of time you spend going through results to find false positives, suppress them, and fix real bugs quickly before it goes into functional testing where it would be more difficult to find with debugging.

Lets say you spend an hour to fix and suppress a dozen flow analysis items at something like a 50% false positive ratio , which is pretty nasty. Now lets say one or two of these real bugs leaks into field, by the time you get info back from the field report to support and development, it may cost a half-day or even 2-3 days of time to address the issue. It is your decision which way is more time saving and effective.

In addition to flow analysis you should really think about using runtime error detection. Runtime error detection allows you to find much more complicated problems than flow analysis can find. Runtime error detection doesn’t have false positives, because the code is executed with a known value and had an actual failure.

Being Successful

The key to success is to choose which rules you want to adhere to, and then get the code clean progressively. Which means start with code you’re currently modifying and extend it throughout the code base until you are done. At some point when you see that there are very few violations, you can run it on the whole code base, rather than just recently changed code. In other words, set a small initial rule set with a cutoff-date of “today”. Then when you see violations dying out, add new rules, run it on the whole code, and review – we’ll discuss how to do the review in a moment. But we recommend extending the cutoff-date backward before adding new rules, because your initial rule set is only things that you feel are critical.

Rules/configurations should really be based on real issues you’re facing. IE if you take feedback from QA, code review, field reported bugs, etc. and then find static analysis rules to address those problems.

Sometimes developers fall into the trap of labeling any error message they don’t like as a false positive, but this isn’t really correct. They may label it a false positive because they simply don’t agree with the rule, they may label it because they don’t understand how it applies in this situation, or they may label it because they don’t think it’s important in general or in this particular case. The best way to deal with this head-on is to make sure that the initial rule set you start with has a small number of rules that everyone can agree on. It should product reasonable results that can be dealt with in a reasonable amount of time.

Resources

Your Two Cents About What Went Wrong With Static Analysis

I’ve gotten a lot of interesting feedback on the What Went Wrong with Static Analysis? post. So many people had their ideas about what was working, what wasn’t, and how to address it, that I thought I’d give people a chance to give their two cents.

I’ve created a poll which some basic issues as listed in the post and in various comments on it. Feel free to vote – there is a place if you have something not already on the list. After it’s been up for a bit I’ll post some results and commentary as is applicable.

Resources

What is Static Analysis… and What is it Good For?

As I talk to people about static analysis, I get a lot of questions and it seems that static analysis means different things to different people. The definitions people use for static analysis can be any or all of the things in this list:

  • Peer Review / Manual Code Review / Code Inspection
  • Pattern-based code scanners
  • Flow-based code scanners
  • Metrics-based code scanners
  • Compiler / build output

A working definition of static code analysis is the analysis of computer software that is performed without actually executing the software being tested. I’d like to talk briefly about each of these techniques, when/where to use it, and why it’s helpful.

Perhaps the oldest version of static analysis is Metrics-based code scanners where we look at things like complexity or even simply number of lines or methods in a file. Pattern based code scanners are what some think of as the traditional static analysis technique. A more modern offshoot of this is Flow-based code scanners where they look at paths through the code and say “Oh this could happen to you or that could happen to you”. The last is, which most people don’t think about, it output from your compiler or your build process which is a very valuable thing.

Peer Code Review

Let’s start with what we would call peer code review or code review or manual code review or code inspection. The idea is that humans are looking over each other’s shoulders, the idea being to check to see if the code is doing what you are trying to do. And there’s some really cool stuff out there to help you do this more efficiently. What you don’t want code review to be is checking syntax, or in fact anything that could be checked by an automated tool.

What we want to do is in some point, get some other eyeballs into some other’s code beyond the people who write for themselves, so we don’t get some kind of self-sustained mechanism that someone decides to do something in certain way.

Peer Code Review will help you finding problems early and functional problems. Most important part of Peer Code Review is have an access to mentoring process where other people look over your shoulder and can give you feedback like “you know… I would do that differently for this case. Here is better way to do it”.

You learn from code review because you benefit from the experience of others. In addition it helps you learn the code base because you are looking at other pieces of an application rather than just your own.

Pattern based analysis

Pattern based static analysis is finding a specific pattern from your code. This could be a good pattern meaning something you want in you code, or a bad pattern meaning something you don’t want to be in your code (bugs).

For example, I may have a branding issue. I want to make sure when my code prints out copyright statement. In this case the pattern is that certain code exists where I expect it to be, such as the footer of a web page.

Or the pattern might be a bad one, such as code that doesn’t free up resources when it’s done, causing memory leaks.

It may also be formatting issues that curly is here and under bar used there and case sensitive names used here and there. But we have to remember that it’s not just syntax problems but really things could cause a bug, for example when we try to internationalize our product or it may cause performance issue things like that.

Additionally the really cool thing about pattern based static analysis is that it improves the developers themselves. For example, a developer writes code for accessing a database. The code includes a try/catch block but he forgot to free up resources with a finally block.

The pattern based static analysis tool catches it and lets the developer know. After a few times with this warning, the developer will learn from it and start right code with a finally block to avoid the violation (and the nagging from the tool).

In other words, the tool is actually teaching the developer by suggesting a best practice  over and over again. Ideally, you encapsulate intelligence of your best developers into pattern based rules.

Pattern based static analysis is not just to check syntax and code formatting. It’s designed to save you time, not to “take time”. Some people say they don’t have enough time to perform static analysis, but generally, you don’t have enough time to skip it.

Flow Analysis

Flow analysis is the idea that instead of looking for a specific pattern in a particular file or class, we are going to look for a pattern based on trying to follow particular path through the application. But rather than run the application, it simulates the application execution.

Flow analysis looks possible paths of the logic and then manipulates the data to see if the bad pattern appears. For instance, it might try to inject bad data to see if it causes a problem, such as a SQL injection.

The paths are hypothetical in that they may or may not actually occur when you use the application, but they are at least possible. The cool thing is that it find real bugs in your application.

One of the things that flow analysis can find is uncaught exceptions. This may always be a problem because sometimes you handle the exception another way. For example, web application servers commonly have a wrapper to catch all exceptions. This is important because system uncaught exception acts basically same way as system exit.

Sometimes you have application stability issues, and very frequently it is related to unhandled exceptions. In such a case, flow analysis is a big help in improving your application.

API misuse is another common source of problems that are handled with flow analysis. Where the API not well understood or poorly documented it can lead to memory leaks or corruption.

With security, it finds the some potential types of problems for you that you can start to work on. It’s a great first pass, but it’s not as powerful as pattern based analysis for preventing issues and for giving thorough coverage, being limited to the hypothetical paths that the testing tool can figure out.

Metrics

Metrics falls into two goals: you want to understand what’s going on in the code and the other is find possible problems. They do this by measuring something in the code. Sometimes when they people talk about metrics, they mean KLOC, cyclomatic complexity, number of methods or classes, things like that.

Metrics can point you to potentially dangerous design issue which is very helpful. They are generally more useful more at the design level than the debugging level.

When tools started doing static analysis about 20 years ago, there were a lot of metrics in place. When people had a bug in field and couldn’t reproduce it they tried to use metrics to suggest where the problem might be, then use a debugger to check the area suggested by the metrics.

The problem is that sometimes it gives you a good idea and sometimes it doesn’t. It really depends on what metrics you are looking for. If you’re using metrics to try and find bugs, it can be difficult and time-consuming. But if you’re trying to use them to understand your application, they actually end up telling you things.

So let’s assume you have a metric that measures the number of lines in files in your application and you start notice giant files. It probably means that design is not as good as it should be, because components should be very discrete and they should have known inputs and they should produce known outputs. When files get large they probably have a lot of complicated logic in the middle of them. Typically it is a good time to look at and refactor and build them down.

Compiler / build output

You should think of compiler warnings as a useful form of static analysis. Internally we set a policy many years ago that our products must compile without compiler warnings. It turns out that many of the compiler warnings are traceable to real problems in the field. At best, they mask real problems buried within your code. If you think that you can ignore compiler warnings, you’re assuming you know as much about the language as a compiler programmer. They put compiler warnings in place because they think about the code in terms of how the language is supposed to be used. If they give you a warning, it means they’re concerned that the code won’t operate properly. It’s best to pay attention to such warnings.

All of these types of static analysis can be valuable in improving your code and your development process, and even your developers as I discussed. I’ll go more in depth on these techniques in future posts.

[Disclaimer]
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else at all.
[/Disclaimer]

Resources