Category Archives: Software Development

Getting the Static out of your Analysis

Day 11 / 365 - Touching static © by xJason.Rogersx
The other day I was talking to a colleague about setting up static analysis properly. I was on my usual soapbox about all the simple steps you can take up front to insure success, and he suggested that I should blog about it, so here it is – How to properly configure your static analysis from the beginning.

This presumes that you’ve already gone through the trouble of selecting the proper tool for the job. And you’ve setup policies and procedures for how and when to use it. What I want to focus on is getting the rule set right.

As I’ve mentioned before there are quite a few ways to have troubles with static analysis, and noise is one of the chief culprits. You want need to make sure that noise is far out-weighed by valuable information.

Noise comes about primarily from having the wrong rules or running on the wrong code. It sounds simple enough, but what does it mean?

The Wrong Rules

Let’s break the wrong rules down into the various ways they can trouble you. First, noisy rules. If a rule makes a lot of noise or gives actual false positives, you need to turn it off. False positives in pattern-based static analysis are indicative of improperly implemented rules, and should be fixed if possible, turned off it not.

On the other hand, false positives in flow analysis are inevitable and need to weighed against the value they provide. If the time spent chasing the false positives is less than the effort that will be needed to find the bugs otherwise, then it’s probably worth it. If not, turn the rule off.

But there are other kinds of noise. Some rules may technically be correct, but not helpful. For example rules that are highly context dependent. Such rules in the right context are very helpful but elsewhere are far more painful than they are worth. If you have to spend a lot of time evaluating whether a particular violation of a rule should be fixed in this particular case, you should turn the rule off. I’ve seen a lot of static analysis rules in the wild that belong in this category. Nice academic ideas, impractical in the real world.

Another way to recognize a noisy rule is that it produces a lot of violations. If you find that one rule is producing thousands of violations, it’s probably not a good choice. Truthfully, it’s either an inherently bad rule, or it’s just one that doesn’t fit your teams coding style. If you internally agreed with what the rule says to do, you won’t have thousands of violations. Each team has to figure out exactly what the threshold for too many is, but pick a number and turn off rules that are above it, at least in the beginning. You can always revisit it later when you’ve got everything else done.

One area of rules that frequently gets called noise is when developers either don’t understand the value of a rule, or simply disagree with it. In the beginning such rules can undermine your long-term adoption. Turn off rules that are contentious. How can you tell? Run the results past the developers – if they complain, you’ve probably got a problem.

It’s important that the initial rule set is useful and achievable. Later on you will be able to enhance the rules as you come into compliance and as the practice itself becomes ingrained. In the beginning, you need your static analysis tool to deliver high value. I ask myself, would I hold off shipping this code if I found this error? If the answer is yes, I leave the rule on, if not, it’s a poor candidate for an initial rule set. This is what some people call the “Severity 1” rules or high priority rules.

The Wrong Code

Sometimes particular pieces of code are noisy. Typically this ends up being legacy code. The best practice is to let your legacy policy guide you. If you have a rule that legacy code should only be modified when there is a field reported bug, then you don’t want to run static analysis on it. If you have a policy that when a file is touched it should come into compliance with static analysis rules, then use that. Otherwise skip it. Don’t bother testing any code that you don’t plan on fixing.

Checking the configuration

Once you have a set of rules you think is right, you should validate that. Run on a piece of real code and look at the number and quality of violations coming out. If the violations are questionable, or there are too many, go back to square one and weed them out. If they’re reasonable, pick another project, preferably from a different team or area, and run the test again. You’ll likely find a rule or two that you need to turn off when you do this. After that, you’ve got a pretty reasonable chance of having a good initial rule set.

Long term

In the long term, you should be adding to your rules. You want to resist the temptation to get all the rules in the initial set, because you’ll get run over with too much work. Plus as developers get used to having a static analysis tool advise them, they start improving the quality of the code the produce, so it’s natural to ratchet it up from time-to-time.

When you see that a project or team has either gotten close to 100% compliance with your rules, or if they’ve essentially plateaued, then you want to think about adding rules. The best choice is not to simply add rules that sound good, but rather choose rules that have a relationship to problems you’re actually experiencing. If you have performance issue, look at that category for rules to add, and so on.

Spending the time upfront to make sure your rules are quiet and useful will go a long way to insuring your long term success with static analysis.

[Disclaimer]
As a reminder, I work for Parasoft, a company that among other things make static analysis tools. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else.
[/Disclaimer]

Resources

Developing Clouds in the Forecast

These days everyone it seems like everyone has their head in the cloud. At least from an interest level that is. When it comes to actually using the cloud, they’re not necessarily ready for a real cloud deployment. The conversation usually goes something like this:

Clouds © by karindalziel
“Do you support cloud?”

“Sure, what did you have in mind?”

“We don’t know, but we know we want to do cloud at some point. Can you make your tools available to us in the cloud?”

“Yes, I can, what model do you want to use and will fit your security needs?”

“We can’t actually open our firewall, and we can’t share our source code outside the network. Maybe a private cloud…”

And it goes on from there. I’ve played this script over and over again. Lots of intellectual interest in cloud, but without any real understanding of both the issues involved as well as the benefits. Why on earth would you be switching to the cloud if you didn’t have some idea of what it was going to do for you? And yet people do it all the time.

Perhaps they are simply falling for the hype – cloud providers are claiming lower start-up costs, less overhead, better scalability and reliability, streamlined process, and cures for cancer. OK, I made that last one up, but it’s close. Seriously, a commenter to my piece on What Went Wrong with Static Analysis? said that all the potential pitfalls of static analysis are avoided if you simply use the cloud. As you’ve probably guessed, he worked at a company providing cloud services.

I don’t want to dive into a detailed list of what cloud can and cannot do at this point, although I may at a later date. But I do want to caution people to at least think about it. If someone says that the cloud will make your life better, ask them to explain how. If it makes sense, great. If not, beware of snake oil.

With that in mind, I want to talk about something that actually will make your life easier, namely a special kind of private cloud called micro cloud. This is an especially useful tool for software development that can make your life easier.

It’s not secret that I’m a VMware (VMW) fan. One of the aspects I like best is that it’s well suited for extreme scalability. You can start with desktop use, like running an alternate OS on your machine without having to reboot. Then you can push that onto an ESXi box in your server room when the virtual machine you were using unexpectedly becomes something useful you actually depend on. And ultimately you can push it off into the cloud as needed and scale it up as needed.

This is where the connection to software development comes in. One of the tedious pieces of getting a software project up and running is setting up the infrastructure. This is known as Application Lifecycle Management, or ALM. You need to have quite a few different goodies available, and while none of them are super complicated, it takes time to put the whole thing together. And then at some point you realize you either need another one, and have to do it again, or the one you have is tool small/slow for the team as the project has grown.

The list of necessary tools includes things like requirements management, project management, source control, compilers, development IDE, build management, continuous integration, testing, reporting, static analysis, code coverage, etc. Each of the items isn’t the complicated by itself. Putting them all together just takes more time than it should. In addition, it turns out the software developers really aren’t the best choice for system administrators, and don’t always deploy infrastructure in the way that you want. This is an excellent fit for virtualization.

Instead of putting together your software in an ad hoc way that has a tendency to grow like Frankenstein’s monster (come on, we know it happens to all of us) you can plan and coordinate between developers, architects, managers, and sysadmins. Figure out what kind of tools you’re going to need all the time, layout the requirements for them, and get the admin guys to build you up a virtual machine that has everything you want. Then test it, fix it, and from then on you can use use it over and over again.

Do it on a virtual machine rather than a physical one, that way when you need a new one for a new project, you can just stand up a new instance of the virtual machine. When you outgrow your hardware, scale it up on the back-end in the hypervisor. If you need to share geographically push it out to a cloud provider or data center.

Or…

You can use one that someone else has already built. If you have a small project and you don’t have security issues that preclude you putting your code outside your own firewall, there are a couple of pure cloud plays that have pretty much what you need. For example you could go to GitHub and use their tools. Or you can check out Cloud Foundry, which also uses Git as a source control system.

If you can’t put your source in the cloud for whatever reason, or if Git isn’t your cup of tea, then you should look for a micro cloud instead. This is a virtual machine pre-built that you can use on a desktop, or in your own server room or even scale it up to your datacenter. Again, you could do all this yourself, but if you can find one that has what you need, you can save a lot of time.

[Disclaimer]
As a reminder, I work for Parasoft, a company that make a variety of tools for software development, including an ALM virtual machine suitable for micro cloud. This is however my personal blog, and everything said here is my personal opinion and in no way the view or opinion of Parasoft or possibly anyone else at all.
[/Disclaimer]

With that behind me, let me relate a personal story. I was working on a personal project at home and wanted to setup a source control system. I happen to be an SVN guy, so I normally setup a subversion server and then use Apache to access it. The SVN install is quick and easy. Apache isn’t too difficult, but by the time you get HTTPS up and running with certificates and the WebDAV SVN connector going it can be difficult. Add to that the normal scenario that you don’t do it very often and it’s easy to forget the little things that will trip you up during setup. Need to say, I wasn’t looking forward to setting the darn thing up.

I started with using a VM like I always do, and as I was adding Apache and SVN to it, I got a feeling of deja vu. I knew of course that I had done it before, many times. Then I figured out why the deja vu. I helped create the VM for Parasoft that just happened to have everything I needed in it. So I download that VM, started it up on my desktop, and set the configuration for what I needed. Other than download time, total setup and configuration was about 20 minutes. Total time and frustration saved: a lot.

Micro cloud is a great way to not only handle your development infrastructure, you can also do tech support on specific environments, setup complicated QA, provide quick POC projects etc. This is one of the cases where it’s easy to see how cloud helps – it not only drastically reduces start-up time and costs, but leaves you in good shape to scale quickly and efficiently.

If there are clouds in your forecast, micro cloud might be the place to start.

False Positives and Other Misconceptions in Static Analysis

wrong answer © by Sean MacEntee
In ongoing discussions here at the blog and elsewhere, I keep seeing the topic of false positives in static analysis come up. It’s certainly a very important issue when dealing with static analysis, but the strange thing is that people have very different opinions of what a false positive is, and therefore different opinions of what static analysis can do and how to properly use it.

In the simplest sense, a false positives means that the message that a rule was violated is incorrect, or in other words the rule was not violated – the message was false. In other words, a false positives should mean that static analysis said it found a pattern in your code but the pattern doesn’t actually exist in your code when you review it. Now, that would be a real false positive.

Pattern-based false positives

False positives and “not false positives” (false negatives) are in two different areas. One is pattern based static analysis, which also includes metrics. There is also flow-based static analysis. One thing to remember is that pattern based static analysis doesn’t typically have false positives. If it has a false positive, it’s really a bug in the rule or pattern definition, because the rule should not be ambiguous. If the rule doesn’t have a clear pattern to look for, it’s a bad rule.

This doesn’t mean that when a rule lists a violation that there is necessarily a bug, which is important to note and the source of much of the confusion. A violation simply means that the pattern was found. You have to look and say I am choosing these patterns and flagging these patterns because they are dangerous to my code. When I look at pattern, I ask myself, does this apply to my code, or doesn’t it apply? If it applies, I fix the code, if it doesn’t apply I suppress it.

It’s best to suppress it in the code itself rather than an external location such as your UI or a configuration file, so that it’s visible and others can look at it, and you won’t end up having to review it a second time. It’s a bad idea to not suppress the violation explicitly, because then you will constantly be reviewing the same violation. The beauty of in-code suppression is that it’s independent of the engine. Anyone can look at the code and see that the code has been reviewed and that this pattern has been deemed acceptable in this code.

This is the nature of pattern-based static analysis. It’s based on an understanding that certain things are bad ideas and may not be safe. This doesn’t mean you cannot do them in a particular context, but that such things should be done carefully.

Flow Analysis false positives

In flow analysis you have to address false positives because it will always have false positives. Flow analysis cannot avoid false positives, for the same reason unit testing cannot generate perfect unit test cases. When you have code that uses some kind of library, for instance your java code calls the OS and something come back, who knows what it’s sending? It’s going to be a false positive because we have to make assumptions about what’s coming back.

We try to err on the side of caution. Be careful here, if it’s possible that something strange might be returned protect against this. This finds bugs, that’s why it’s so valuable. You also need to understand the power and weakness of flow analysis. The power of flow analysis is that it goes through the code and tries to find hot spots and find problems around the hot spots.

The weakness is that it is going some number of steps around the code it’s testing, like a star pattern. The problem is that if you start thinking you’ve cleaned all the code because your flow analysis is clean, you are fooling yourself. Really, you’ve found some errors and you should be grateful for that.

The real question with flow analysis is the amount of time you spend going through results to find false positives, suppress them, and fix real bugs quickly before it goes into functional testing where it would be more difficult to find with debugging.

Lets say you spend an hour to fix and suppress a dozen flow analysis items at something like a 50% false positive ratio , which is pretty nasty. Now lets say one or two of these real bugs leaks into field, by the time you get info back from the field report to support and development, it may cost a half-day or even 2-3 days of time to address the issue. It is your decision which way is more time saving and effective.

In addition to flow analysis you should really think about using runtime error detection. Runtime error detection allows you to find much more complicated problems than flow analysis can find. Runtime error detection doesn’t have false positives, because the code is executed with a known value and had an actual failure.

Being Successful

The key to success is to choose which rules you want to adhere to, and then get the code clean progressively. Which means start with code you’re currently modifying and extend it throughout the code base until you are done. At some point when you see that there are very few violations, you can run it on the whole code base, rather than just recently changed code. In other words, set a small initial rule set with a cutoff-date of “today”. Then when you see violations dying out, add new rules, run it on the whole code, and review – we’ll discuss how to do the review in a moment. But we recommend extending the cutoff-date backward before adding new rules, because your initial rule set is only things that you feel are critical.

Rules/configurations should really be based on real issues you’re facing. IE if you take feedback from QA, code review, field reported bugs, etc. and then find static analysis rules to address those problems.

Sometimes developers fall into the trap of labeling any error message they don’t like as a false positive, but this isn’t really correct. They may label it a false positive because they simply don’t agree with the rule, they may label it because they don’t understand how it applies in this situation, or they may label it because they don’t think it’s important in general or in this particular case. The best way to deal with this head-on is to make sure that the initial rule set you start with has a small number of rules that everyone can agree on. It should product reasonable results that can be dealt with in a reasonable amount of time.

Resources

AI Smackdown – Siri vs Eliza

John McCarthy © by eschipul
John McCarthy, creator of LISP, and father of modern artificial intelligence (AI) died today. He was one of the first guys to work seriously on getting computers to do things that you wouldn’t normally have expected computers to do, like play Chess or have a conversation.

One of the most common AI applications programmers have played with over the years is Eliza. Like others at the time, one of my first attempts at writing an interesting piece of software in my youth was to alter an Eliza program written in BASIC (as taken from the pages of Creative Computing, for all you old-timers). Eliza is a simple software psychoanalyst created in the 60’s by Joseph Weizenbaum. While Eliza wasn’t written by McCarthy, she was certainly a product of his inspiration on the industry.

Ever since the new iPhone came out with Siri, the digital assistant, I’ve been thinking about comparing her to Eliza. I don’t often get to play with AI here at Parasoft and the death of McCarthy brought the topic to my mind again. I’ve put together a small comparison of questions and answers from Siri and Eliza. Note that with Siri, your mileage may vary, she doesn’t always answer the same way. I’ve obviously skipped most of the “useful” things that Siri can do, as it would be an unfair comparison for poor old Eliza, who doesn’t have all the modern advantages that Siri does.

It’s interesting because while they share a common logical bond, their personalities are very different. Siri was designed to be helpful and amusing. Eliza was specifically designed not to be helpful, in that she’s supposed to be a therapist and lead you to your own conclusions rather than give you answers. Eliza isn’t an assistant, she’s an attempt to play the “imitation game” also known as the turing test.

For the samples below I used a java version of Eliza that you can try for yourself on the web.

Samples:
Q: Can you help me?
E: You believe I can help you don’t you ?
S: You can say thinks like… (A list of sample queries)

Q: What are you doing?
E: Why do you ask ?
S: Just thinking

Q: Who is your favorite composer?
E: Why are you concerned over my favorite composer?
S: I guess I don’t have one.

Q: Are you married?
E: Why are you interested in whether I am married or not ?
S: We were talking about you, not me.

Q: What is your name?
E: I am not interested in names.
S: My name? It’s Siri.

Q: You are beautiful
E: What makes you think I am beautiful?
S: Oh, Stop.

Q: I love you
E: Perhaps in your fantasies we love each other.
S: Oh, Stop.

Q: How are you?
E: Does that question interest you?
S: I am well ____ (name)

Q: What do you think?
E: What is it you really wanted to know?
S:It’s nice of you to ask, but it doesn’t really matter what I think
or I think differently.

Q: Good night
E: I’m not sure I understand you fully.
S: Ah… it’s still only _____ (fill in the time)
or other answers depending on the time of day.

Q: Can you send a message to my brother?
E: Tell me more about your family.
S: I don’t know who your brother is, To whom shall I send it, I need a contact name, phone number, or email address.
orWhat do you want to say to ____ (your brother)?

I’m not sure about all of the reasons they’re so different. Sure, the state of the art in AI has come a long way since the 60’s. Or is it just that men’s expectations of women have changed? I was tempted to write that perhaps people are more friendly or helpful now than in the 60’s but that’s ridiculous. Perhaps it’s only that computers are now more helpful and friendly than they were. Is it possible that Eliza’s seeming bad mood had something to do with her obvious handicaps in memory and CPU. Maybe she was aware of this, and it caused her to be ill-tempered? In any case, Eliza comes across as a bit cynical, while Siri is much more light-hearted most of the time. Siri’s mood can definitely change as you can see from some of the answers.

It occurs to me that it would be funny to get Siri to talk to Eliza – would Eliza help Siri, or would Siri end up making Eliza more friendly?

So if your computer was nice to you today, thank John McCarthy.

[Update I added a few more links and minor clarification as well as AI resources]

Here’s a list of my favorite fiction books about killer AI.

Some resources on AI artificial intelligence:

Artificial Intelligence: The Basics

Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms

Artificial Intelligence in the 21st Century (Computer Science)

The Artificial Intelligence Revolution: Will Artificial Intelligence Serve Us Or Replace Us?

Books on AI at Amazon