Tag Archives: static analysis

Going with the Flow in Static Analysis

As part of my ongoing series about Static Analysis issues I want to talk about the relationship between the traditional static method and the newer dynamic or flow analysis method. People seem to misunderstand how the techniques relate and what each is good at. In particular, many seem to think that flow analysis is a replacement for non-dynamic analysis, which couldn’t be more wrong.

For the sake of having a simple term to identify both methods, I’ll refer to the older “static” method of static analysis and “pattern-based” and the newer flow-based method as “flow-based”. This is somewhat of a misnomer in that both types are really based on patterns, but seems to be a somewhat common way of referring to the two methods. If the terms I use bother you, feel free to do a search-replace function in your head when reading. I’m not too worried at this point about a strict technical explanation of each, but rather to their relationship. The goal is to have a way to differentiate in terms these two particular types of static analysis. Of course there are other types of static analysis as well, but I’ll leave that for another day.

Let me begin by saying that there is in fact a very strong relationship between pattern-based and flow-based static analysis, at least at an academic level. In almost every situation there are a set of pattern-based rules that would allow you to code in such a way that would prevent the occurrence of the issue being found by the flow-based rule. Given the nature of how flow analysis works, it can never find all possible paths through an application. This makes it a good idea to start programming in a more pro-active way to prevent the possibility of issues you’re concerned about.

For example, in security, one of the basic problems is using tainted data. Somewhere in the application between getting data from the user and operating on the data, you need to check if the data is safe. Depending on how far apart the operations are, it can be extremely difficult if not impossible to check every possible path. Security code scanners that rely on flow-based analysis attempt to find possible paths between user input and uses of the input that allow tainted data to be operated on. They can never find every possible path even if you let them run for an incredibly long time.

Instead, if you restructure your code so that input validation is done at the moment of input, then you don’t have any paths to chase, and you don’t have to worry about tainted data in your application. Flow-based tools won’t find anything anymore, because you won’t have any unprotected paths. This is sometimes a more difficult sell for developers, since it doesn’t provide them with a single broken piece of code that needs to be fixed. Rather it tells them that the way they’re writing code now could be improved – a bitter pill to swallow.

However applying this same principle to things like memory corruption, resource consumption, etc. can make the program far more robust than chasing possible paths ever could.

An excellent methodology is to start with flow-based analysis and fix the low-hanging fruit. Once you have compliance with your flow-based rule set, then review what you’re doing with flow and compare it to pattern-based static analysis. Determine as best you can how to apply static analysis and catch all possible potential problems before they happen, and put that into place. This moves you from a reacting to issues in your software to a more preventative stance.

There are those who say that flow-based analysis is preventative, but it’s still symptom driven – namely trying to find the openings and bugs you left in your code. Pattern-based analysis, when deployed properly, can be used to address the root problems. In our tainted data example, this means changing our coding style so that we don’t have paths where data could be tainted – root problem handled.

Essentially, flow-based analysis finds real bugs in possible paths. When you get a message from it, you just decide whether you care about that path or not. Static on the other hand tells you about the potential for a bug, not necessarily about the existence of a bug. Again, with our security example, Flow-based says “you used tainted data” where pattern-based says “this data could be tainted before use”.

When compared, you can see that flow-based analysis is a great way to find low-hanging fruit, because it’s looking for bugs instead of you doing it. On the other hand, because it works by guessing (flow fans hate the “guessing” term) at possible paths through your code, it will always be by it’s very nature incomplete.

Pattern-based analysis on the other hand requires restructuring your code and behavior if you want to achieve it’s full value. Some code is not well suited to such change, such as working legacy code.

Used together you have a very powerful solution that is much more robust than either technique on it’s own.

[Disclaimer]
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else at all.
[/Disclaimer]

Resources

The Wrong Tool

Did you ever buy something, only to find out that it just wasn’t quite right for you? I don’t mean the usual buyer’s remorse over a large purchase, like a new car. I mean you bought a sports car, and somehow missed the fact that you like to haul your motorcycle to the desert on weekends. Oops!

Not surprisingly, you’ll find people do this frequently with small purchase, for example apps for your phone. You’re hoping for a specific utility, you read a description, it sounds right so you buy it. It might even seem to work OK in simple tests. I had this happen to me recently with a small external microphone I bought for my smartphone to do audio recording. It worked for a couple of minutes, but when I tried to actually use it, the audio was garbled or non-existent for much of the recording. Argh!

Frequently, this is exactly what happens when people decide to buy development tools. They take advice from someone who has used the tool individually, or in a limited environment. When they try to test the tool, perhaps in a pilot program, everything appears fine. Then when deployment begins so do the problems. False positives, configuration problems, poor workflow… the list is seemingly endless and sadly too familiar.

What happens is that the selection process for the tool is inadequate. Most POCs (proof-of-concept) that I see are really simple bake-offs. Someone has an idea in mind of what they think a tool should do and they create the good old checklist of features. Sometimes this is done with the help of a single vendor – a recipe for disaster. Other products are categorized based on the checklist, rather than looked at holistically to see what else they have to offer.

In addition, this methodology tails to take into account the biggest costs and most likely hurdles to success. In order to select the right tool, you have to take into account how it will work in your world.

If for example your developers spend their days in Eclipse, and you select a tool that doesn’t run in Eclipse, then you force the to spend time opening a second tool, possibly dealing with extraneous configuration. Not to mention when they get the results, they’re not in place they’re need – the code editor.

Such issues compound over time and people, carrying a tremendous burden with them. For example, about 10 years ago people got enamored with the idea of doing batch testing for things like static analysis, and then emailing the results back to developers. While this may be the simplest way to setup static analysis, it’s nearly the worst way to deal with the results. You don’t need error messages in your email client, you need them in your editor. (see my earlier post on What Went Wrong with Static Analysis?)

These are just a couple of ways you can run into trouble. I’m doing a webinar at Parasoft about this on September 30th registration is free. Stop by and check it out if you get a chance.

Remembering a friend and luminary

Adam Kolawa (1957 - 2011)

Earlier this year my longtime friend/boss/partner/hunting buddy Adam Kolawa died. We worked together since 1992, before the internet went commercial. Over the last nearly twenty years I learned a lot from Adam about software and testing as well as other things.

Adam had a strong vision about what could be done with software. He was a very logical technical person and believed that the way software is created can be improved greatly. I remember learning this early on. I started at Parasoft doing database work and tech support. We had this really cool parallel processing software called Express. With it you could run software on a heterogeneous network of machines, say an IBM machine running AIX alongside a Sun machine running SunOS, and even add in a Digital machine running Ultrix. Needless to say the setup of such software could be complicated.

At one point I realized that many of the same questions were coming to us over and over again, so I put together one of those funny FAQ things that you saw with open-source software. I carefully listed the basic installation and configuration problems that might occur with steps to handle them. I was so proud of myself and showed it to Adam. His response was true to his nature. He said “Great, now make it go away.” While he was a strong proponent of good documentation (PhD’s are like that…) he felt like such information should be unnecessary. So he guided me to go and fix the software so that as many of the problems from the list as possible would be handled directly in the software.

This principle guided all the innovations to come from Parasoft since that time. Parallel processing technology morphed into memory tracking and bug-finding, always on the quest to create better software, more quickly, with less effort.

Along the way Adam wrote numerous papers, articles, and even a few books. The most notable are Automated Defect Prevention: Best Practices in Software Management and The Next Leap in Productivity. The former should be required reading for anyone trying to run a software development organization. The latter is an eye-opening look into not only improving IT but turning it into an asset rather than a cost-center.

Adam was instrumental in nearly all the patents generated at Parasoft. He had a very out-of-the box way of looking at problems and coming up with new unique solutions. I always attributed this at least in part to his physics background – why shouldn’t a man comfortable with giving the weight of the universe feel like he can generate test cases automatically by having a parser read some source code?

I mention all this partially because it’s cathartic for me, but also because STP – Software Test Professionals is currently having an open vote on the Test Luminary of the Year. Take a chance, go read Adam’s bio, and if you think like I do that he made a lasting impact on the software industry, then why not vote for him? It’s a fitting legacy for a man who dedicated his adult life to the improvement of the software development process.

SQL Injection – When Will We Learn?

Once again a major web site has been hacked using good old-fashioned SQL injection. Over the weekend Nokia’s developer forum was hacked, resulting in a Homer Simpson face being put up on their web (funny) and the loss of names, email, and other personal for many developers (not funny). This is but the latest in a now very long string of SQL injection attacks, and personally I don’t see much excuse on the part of those attacked.

It might have been possible a year ago for a major corporation to decide that the threat of SQL injection was less than the cost of preventing it, although that’s debatable. In an AntiSec world, that is no longer possible. The threat is well known, as well as the mechanisms to prevent such a threat. There are well-known steps once can take to avoid injection issues. This will not guarantee that hackers cannot do anything, but why make it easy for them. One LulzSec programmer said in an interview that it was almost embarrassing to use such a simple hack – it made him look bad as a hacker.

So what CAN be done to improve software security? Well a few simple things: policy, tools & training.

Policy
It all starts with having a proper security policy. This means the organization needs to think ahead of time about what can happen and how to prevent it. Assessment of the likelihood of particular threats happening, their ease, the cost of dealing with them versus that cost of preventing them, all need to be part of a standard threat assessment. If you’re not sure how to do this, seek professional help from someone that knows what they’re doing.

Having the correct policy based on real world information lets you control and justify costs and scheduling delays. Frequently organizations end up checking security in the QA phase of software development where it is way too late to do what’s needed. Better to bake it into the process.

Be especially careful not to fall in the trap of “We’ll make sure nothing bad ever happens” which can lead to further problems. You need to understand that at some point someone WILL get in, and decide now how you’re going to mitigate the loss. In other words, you shouldn’t have passwords stored in plain text on the fallacious theory that you can prevent all break-ins.

Your security policy should include at a minimum requirements for password strength, encrypting private data, proper use of admin passwords, and other such items.

Tools
First and foremost – there are NO silver bullets. There is not a single tool or a set of tools out there that will secure your site. Such a solution is not likely in the foreseeable future either. That being said, there are definitely tools available to make the process of securing your site and software easier. Such tools should be a critical part of your security tool kit.

Static analysis is the most critical tool to prevent SQL injections. Outside of making all your database calls into stored procedures, it’s about the only thing that can really batten down the hatches. If you really want to secure your site, you must have static analysis tools that will identify dangerous input API’s, and check that validation is always, always, always performed.

Which leads me to my next point, there are those who concentrate on penetration testing tools and fixing issues found there. They believe that flow analysis is a much more interesting technology that stodgy old static analysis. Therein lies the problem. Flow analysis is a fine paranoia tool to make sure that you didn’t miss anything, but it can never find all paths through a site or application, and cannot ensure that you are safe merely because it didn’t find anything.

Use static analysis to make sure you’re validating everything, use penetration testing to be sure that your static analysis is setup properly. If pen testing finds anything, fix it, but look back at your process, tools, and procedures and figure out how it slipped through the cracks. Pen testing tells you more about your security infrastructure than it does about your software.

Training
I’m including various techniques that reduce or eliminate the possibility of SQL injection as part of training. Learning techniques is after all the point of training. I have worked with organizations who were trying to improve security and not surprisingly find developers challenging the results of a security assessment. It’s not a measure of how smart or experienced they are, but rather another reminder that on the whole the software industry is lacking proper training in secure software development.

Michael Sutton, VP of security research at Zscaler said it very well: “It’s very easy to create a Web application today. It’s not easy to properly secure that application.”

The techniques to be used should easily map back to your security policies. For example encryption should be required for all private data. Database credentials must be appropriate to the account, IE the web site should not be accessing the database as an administrative user.

Used stored procedures where ever possible. They will take some extra effort, but make it much more difficult for hackers. As a side benefit you may see a performance improvement switching to stored procedures. You also make it less likely that a missed input validation will have a negative effect.

Validate all input, every bit, no matter where it comes from. Don’t just check input fields in forms, also check when you’re pulling data out of the database, from an input stream, etc. No excuses, steps. Hackers frequently gain access to systems in an incremental fashion – image that someone was able to subvert your very secure web forms because he found a way to put corrupt data in a field in your database.

When validating do not rely on penetration testing and flow analysis to find out if you’ve used tainted data. As mentioned above, these techniques are not thorough. Use proper naming conventions so you can spot input methods, black-list unknown API’s during testing, and require that the distance in the code between input and validation is zero. This means that your flow analysis won’t find anything, but that’s OK. It also means that you’re safe.

There are many other simple things you can do. To find out more in detail, check many of the excellent articles and tutorials on the web, such as Strike Back at SQL Injections and Stop SQL Injection Attacks Before They Stop You, seek expert advice, and feel free to ask questions.

Conclusions
SQL Injection can almost always be avoided. Yes, there are other security issues out there that you need to handle as well, but this is one that can be easily executed and is very likely to happen. It’s also relatively easy to prevent, which should put it high on anyone’s to-do list if they’ve made a thorough risk assessment.

Resources