Category Archives: Software Development

Software Terms Without Definitions

I’m often bemused by words in the software industry aka computer science. It’s generally OK when industries just make up new words for something new, but in software we re-use words that have (or at least had) a real definition, and then use them completely differently. Or worse still, twist the definition just enough to make it not obvious that the meaning has changed. Sometimes even the words had a good software meaning, but it’s been killed over time – like artificial intelligence or AI.

With that in mind, and without a lot of blather, I just wanted to vent and list a bunch of them here. I’m not going to define them properly, because how could I? If I’ve missed your favorite, let me know in the comments, twitter, etc. If you disagree let me know and we can argue. 😉

The list is alphabetical, because I’m a human and think that way. If I was ordering it by capability to annoy, it’d be AI, Software engineer, and everything else beneath. Enjoy!

Agile – you’d think this had a good definition but ask around and see what happens.

API – used to have a good meaning but no longer. h/t to @keith_wilson.

Artificial Intelligence or “AI” – this term has lost all meaning. I have come to agree with Musk and others that real AI will be real dangerous, but nothing we currently call AI is “artificial intelligence” in that sense.

Computer science – You could argue this one is real as long as you apply it to hardware, but software? Forget about it. Blogs and rants on this are in the queue.

Cybersecurity – is it antivirus? firewall? software? coding?

DevOps – I thought this had a definition, but many, like my friend Theresa, disagree, so it must not.

Engine – this one is now just a noise word used to give something a fancier sounding name.

False positive – developers throw this word around in a way that usually means one of the following: 1) the tool output was actually wrong; 2) I don’t like this finding or don’t think it’s important; 3) I don’t understand this finding or why it’s important (usually a type of #2). It REALLY only means the first one, but the most common definition includes all of the above.

Framework – should be a great word. Was a great word. Now a marketing word.

Memory leak – you think this has a definition, but try to look it up. In my world we think of it as memory you can no longer access or control, including freeing it. Others think it means memory you never freed. (Note – they’re wrong.)

Mock – seems simple enough, but it’s surprisingly broad. Some even think it includes service virtualization.

Platform – again a term that had a meaning once upon a time, now it just means “some package of software we sell”.

Service virtualization – this is a fair call. The original meaning of the term has been overloaded and extended and the “new” meaning has become more common in the software testing world, while the “old” meaning still holds true for hardware, deployment, and networking people.

Software engineering – please, this is one of the worst. Most people who call themselves software engineers don’t even begin to behave like engineers. If they are, what particular standards were they taught that all other with the same title were also taught? I thought so.

Standards – You think this one has a meaning, don’t you? In “engineering” standards means something. If you’re an engineer, you already know what I’m saying. If you don’t get this, you’re not an engineer.

How did this happen? Is marketing to blame? Or is it just that there is no “software science” even if there is “computer science?

Again, if you have a favorite let me know and I’ll add it to the list. If you disagree I’m always up for a good twitter argument. If we get enough I might add it as a new Hall-of-shame permanent list. I feel like I’ll come up with a bunch more myself as soon as I hit publish.

[Update – suggestions coming in already. I’m putting them in proper alphabetical place, but will reference the source.]

[Update 2017-09-19 – added “memory leak”. Should have realized that was needed, it’s an obvious one. also false positive]

Get Started with Free Service Virtualization

Free service virtualization, sounds great! Whenever you hear free, should get nervous, I know that I do. After I wrote this title I looked at it and immediately hated it. But here’s the thing – at my day job at Parasoft we’ve just taken one of our really great products, Parasoft Virtualize, and made a free “community edition” version of it.

So who needs this and why should you care about it? Well software applications have gotten a lot more complex in the last decade. Time was you had a simple monolithic desktop application and that’s all you had to worry about. Some of them had a little connectivity, like to a database or maybe simple external dependencies, but mostly they stood on their own. Today’s “applications” look more like systems or even systems-of-systems. It’s not uncommon to have a relatively small core application but surrounded by a plethora of dependencies like databases, cloud APIs to provide data, shipping services, payment services and even connections to physical devices in the real world – the Internet of Things or IoT.

That’s where the “service virtualization” technology comes in. I know, I know, it’s a horrible name and it’s already caused you to think it’s something other than what it is. Nothing I can do about it, that name is in use by the analysts and I have no control over it. I think of it more like “communication emulation” in that it emulates the communication. Think of it this way, instead of APIs linked into an application as part of the compilation processed, we now have services that are accessed live dynamically – meaning we talk to them and they talk to us. Even in the IoT world of SmartHome or SmartFactory or SmartCity it’s all about pieces talking to each other. This gives as remote info, remote control, and even some degree of autonomous decision making – like the NEST thermostat. Initially I used the app to control the thermostat to my liking, now it just figures out what I was doing and mostly does it for me.

Testing these kinds of systems is a huge pain. You need a test lab that has one of everything you’re connecting to. If you’re updating some of them, then you need a lab with the old one AND the new one – like a new version of Oracle or MySQL. Setting up the lab costs time and money, and then I have to fight with other teams to use it. Service Virtualization let’s me make fake (virtual) versions of the things I depend on, and then use them to test instead of needing the real thing.

This not only makes it faster/easier/cheaper to test, but it frees IT to do other important things. Plus I can make these virtual things behave how I want them to – if I want them to flood the network, they will. If I want them to be fast or slow to respond, I can do that. If I want one of them to be a bad actor and pretend it’s been compromised, no problem. My testing will be more thorough in addition to easier.

Once you realize that service virtualization technology is for you, the next step is to choose a tool. Lot’s of people instantly go check open-source, because of course it “doesn’t cost anything”. I’ve done a pretty thorough check of all open-source SV tools and at the moment they’re only really useful if your whole world is centered in http/https. Even then there are lots of other features like using a UI to create, manage, and deploy the virtual assets. So now that Parasoft created a free version, why not see what commerical software offers you? You can download it here.

Try it, you’ll like it.

Does Cloud Change Static Analysis

Look, we all know that using static analysis tools can be a real pain. In the past I’ve talked about some of the reasons people struggle with the output of AppSec tools. Similarly people struggle with using static code analysis. I even did a poll about static analysis challenges at one point.

From the feedback I’ve gotten, it seems that some people think that doing static analysis via SaaS (IE the cloud) would address the problems I’ve discussed. There are real challenges in getting the most out of your static analysis, but the claim that somehow cloud will solve them is ridiculous marketing hype – why would it change at all? Why should developers even be able to tell the difference? It doesn’t address any of the core issues. There are benefits you can get from using cloud for your static analysis aka Static Analysis as as Service (SAaaS?) such as reduction of up-front costs, saved IT costs, and easy deployment. But the most common problems are the same whether you run the tool in-house or use a service.

The core problem I mentioned was really getting developers to buy-in. They need to believe in the results because they need to fix them. Once the developers start picking and choosing what to fix, you’ve lost. They’ll spend countless hours challenging results and explaining why they’re not important – the inappropriately labeled “false positive“. Changing the method or location of how you run static analysis may have ramifications on the overall process, but it will in no way affect how developers perceive the results.

Getting the static analysis tool running is one of the first steps in a successful rollout, but from there you’ve got to do several things to make sure that you’ll get the value you expect.

Static Analysis Policy

It begins with having a clear static analysis policy. The policy should include when static analysis must be run and when it can be skipped. It also needs to cover when suppressions are acceptable, how severity level affects fix now vs fix later, what rules you must run, and how to handle legacy code. Legacy is one of the big problems – do you fix everything in your code regardless of age? Can you just fix it when you happen to be editing one of the old files? Should you only run static on the code you actually change in old files? These are real issues that will occur when you deploy and if you don’t decide what is proper, each developer will do their own thing.

Training

Developers need to be trained to use static analysis. Usually people remember to train on the mechanics of the tool, but not the further training that ensures success. Developers need to know when/how to suppress – does it go into the code or into an external system? They need to know how to find out more information about the problem. They need to understand what the severity levels mean and how it will affect their decisions.

It’s important as well that they understand the ramifications of a particular error. I’ve repeatedly had the experience of a team claiming a static analysis error was invalid when it was actually a real serious problem that they didn’t understand. Heartbleed is a classic example of this behavior. Finally your training needs to shift the mindset of the users from “static finds bugs” to “static finds bad code”. This distinction is crucial to get the most value. The “bugfinder” rules in static analysis are the proverbial tip of the iceberg. They’re only a small part of the full value. The bigger value is the rich set of coding standards that represent hundreds of man years in crafting best practices that help you harden your code and avoid problems in the first place.

Persistent Static Analysis Suppressions

Suppression handling can make or break your project. The symptom of this is developers saying things like “I keep fixing the same things.” What they mean isn’t that they’re fixing them, but they keep seeing the same violation and tagging it as invalid or acceptable every time the tool runs or versions change. They understandably view this as a stupid tool.

There are two schools of thought on suppressions. One is that they belong in an external system, whether it’s the static analysis tool itself, a file in source control, or a spreadsheet. The other idea is that they belong in the code. There are advantages to both, but I prefer the “suppressions in code” method. The benefits of this are that you never end up with issues that cause old suppressions to come back, because they’re tightly coupled to the code. A secondary benefit is that suppressions will end up in source control, you’ll know who did them, when they’re done, and if they left a comment you’ll even know why. This is really important if you operate in a compliance environment like FDA, Aircraft/DO-178B/C, or Automotive ISO 26262.

Good documentation

I’ve alluded to the idea that the docs need to explain why a particular static analysis rule is important. I’ve got several things I look for in good tool documentation.

  • Example bad code
  • Example fixed code
  • Impact – what will happen if you don’t fix this violation
  • Possible security relevance
  • Resources to learn more
  • integration to IDE – right-click on a violation to see the docs

Summary

Getting your static analysis rollout right is crucial to your success. There are many options from on-premise to cloud-based and you should carefully weight the benefits of each approach. But don’t expect the cloud to solve all the challenges you’ll face. There is no substitute for a well-planned tool deployment.

Resources

Open-Source Project Activity Demystified

Open-source projects are spread across a wide spectrum of maturity and activity. When choosing to use open-source it’s important to select a project that has lots of active contributors and recent development unless you’re expecting to take on the project development yourself.

Determining project activity can be done by looking at project statistics such as GitHub provides. Often projects are started by a single individual who has a particular problem they want/need to solve. Once the software is “working” the project can stagnate. A few select projects reach a critical mass where multiple contributors work to keep the project up to date, fix bugs, add features and create a large useful popular project.

Open-source activity basics

Here we will compare a small semi-active project Netflix curator with an active popular one, Angular.js to see how you can tell the difference. First, there are three basic statistics at the top of every GitHub project: Watch, Star and Fork.


Watch is the number of people who have added the project to their watchlist. This gives them updates about the project and is an indication of the number of people who care about changes to the code, rather than just use the project.

Star is the number of people who find a project interesting and want to indicate that. It also adds a bookmark for favorite projects.

Fork is the number of people who have cloned the repository with the intention of adding their own changes to it. Often times such people don’t actually contribute but it shows a level of interest in contributing.

Notice that the very popular and active Angular.js project has over ten times as many watchers as Netflix curator. As for Forks, Angular.js has an even bigger margin over Netflix curator – almost one thousand times as many forks.

Contributors

A second area to look is the “Graphs” tab which shows graphically information about contributors, frequency of code changes, etc. The graphs below show the contributors to each project.

Graph of top contributors to angular.js project
Angular.js top contributors

Notice that the top 4 contributors to Angular.js each have tens of thousands of commits. The list of significant contributors is quite large which not only provides a wealth of ideas for new features but also reduces risk when a contributor leaves the project.

In contrast, the top 4 contributors to Netflix curator quickly drops to less than 100 commits – again a difference of almost one thousand times. If the main contributor leaves, or grows bored and moves on to something else, the project is completely stagnant – if you want anything you’ll need to do it yourself.

Graph of top contributors to netflix curator project
Netflix curator top contributors

Code change frequency

Next we can look at the frequency of code change. The Netflix curator exhibits a common tendency for a project to stagnate at some point as it has the basics of the desired functionality from the single original contributor.

Graphs of code update frequency for netflix curator project
Netflix curator code update frequency

A larger set of contributors with more ideas and free time helps to keep a project vibrant as you can see with the Angular.js project. Studies have shown that larger and more complex open-source projects tend to attract more developers.

Graphs for code update frequency for angular.js project
Angular.js code update frequency

Network / Project forks

Finally, we can check the network graphs to see how many people are forking the project and doing something new with it, which is a telling indicator of how many people are really interested in the project and want to do their own thing with it. Note here that we have only a couple of forks for Netflix curator that were never merged back in,

graph showing how many forks there are for netflix curator project
Netflix curator network forks

while the Angular.js project has too many forks to display.

message that there are too many forks to display
angular.js network forks

At any given time you can quickly see which repositories are most active by checking https://bithub-ranking.com and an explanation of the GitHub statistical graphs are available at https://help.github.com/articles/about-repository-graphs/.

While most typical open-source projects won’t make the most-popular list, doing a bit of investigation into the health of an open-source project can help make sure that the code you’re using will be maintained and updated to keep up with emerging technologies for years to come.