Software Metrics
The article The Good, the Bad and the BS by Larry O’Brien is a sad truth. We always want to do the best possible job, but how can we prove we really did? How can we compare if one developer/team/firm is better at cranking out software than another developer/team/firm? From estimation to hiring to delivering systems, we could definitely use some metrics to advance the software trade. After all, us software developers have been terribly good so far at doing impossible things, judging by the saying You can’t manage what you can’t measure (looks like Tom DeMarco said something like this as cited here), so we might as well start doing the mundane: collaborate on meaningful metrics, establish industry benchmarks, and use them. There are of course difficulties, many of them stemming from the young age of software development profession.
Nothing is the same, or Too many ways to do things
Truly, almost every software system or product is unique. Variety of programming languages, hardware platforms, operating systems, software architectures, integration interfaces, user interfaces, networks, etc. is such that it’s really tough to come up with even one metric that would make sense for any combination of the above. Even such deceivingly simple thing as code size is severely limited: SLOC of a Java program running in a plain JVM can’t be compared with that of a Java program running in a J2EE application server; size of PHP code can’t be compared with size of ASP.NET code doing similar thing for pretty much any purpose except bragging rights, and so on. I once made an argument that when it comes to on-line software collaboration systems, Trac is a simpler product than GForge because 17.4K SLOC in Trac’s version 0.9.5 is less than 85.0K SLOC in GForge’s 4.5.11 (numbers are from analysis with SLOCCount and include neither Trac’s templates nor any HTML from both products). That may look like a clear-cut statement, but Savane for example has 33.5K SLOC in version 1.4.0 and CVSTrac has almost 17.7K. With differences like that, it’s not as clear anymore what’s less complex, and I didn’t mention yet what languages are involved. (As a side note, I do suggest that you use Trac because it’s the best
.)
Nothing is public, or Abundance of variables
Another example. Transactions per second seems to be a widely accepted metric for non-functional aspect of a software system. It’s also a highly protected one, as most vendors reserve a sole right to publish TPS for their products and restrict their users from doing the same. Such protectionism is common at every level from an individual software artisan to a big vendor, and part of the reason is that TPS or some other performance metric requires careful disclosure of variables that may affect it, and nobody really knows all of them.
Nothing is under your control, or Software as a service
Lately, some industry metrics and benchmarks for them were literally forced to be established. If you use somebody’s software in a service delivery model, you have to decide where on the 4D surface of price, uptime, response time, and functionality index (I’m making this one up as there’s no such metric yet) you want to be. Not only that, but you have to agree on how to measure these numbers, put them in your SLA, and hold your service provider responsible for sticking to it. SLA however seldom says anything about service provider writing clean code or using a specific OS. But if, similar to having the data for say cars, you knew, hypothetically, that operating system #1 fails twice more often than #2, would you want to select a service provider who made the right choice of OS?
Nothing is secure, or How many patches did you install today?
Some security researches claim that Windows has less registered security vulnerabilities than a typical GNU/Linux OS. Community counters that Windows vulnerabilities take much longer to get fixed. SANS Top-20 Internet Security Attack Targets usually has significantly more Windows targets than any others, but that may be due to popularity and installed base. All of this is confusing, and although I am firmly in the F/OSS camp on this issue I do believe that better security metrics are needed, because the war with on-line crime is ours to lose.
Nothing is impossible, or What metrics do we need?
At this point, we need all the metrics we can think of, the more the better. If it’s reliable and valid, try using it and see where that takes you. There’s plenty of resources on where to begin.
- Wikipedia article on software metrics discussed a more narrow subject of metrics as they apply to code; there’s also plenty of discussion and products for interesting metrics that assess code structure and not just size, for example Robert Martin’s OO Design Quality Metrics (PDF) and Using Metrics To Help Drive Agile Software;
- Popular metrics for agile software development are velocity (story units developed per iteration according to Are iterations hazardous to your project? by Alistair Cockburn), burndown, code coverage by tests; IBM / NC State University XP Study Metrics is an interesting read on the subject with further references at the bottom;
- Some of the general software process metrics are mentioned in A Software Metrics Primer by Karl E. Wiegers;
- There’s even more discussion and products that measure non-functional metrics for performance; Myth of the nines is a provocative look at system availability;
- Cost estimation metrics are a available but probably not widely used; Software Cost Estimation: Metrics and Models has some pointers;
- Metrics such as score on The Joel Test: 12 Steps to Better Code can help quickly assess the health of a project.
The more experience we gain measuring our work, the better we will be at selecting the right metrics for each context, analyzing them, and making better decisions based on this information. And then we can brag about that
.



April 17th, 2007 at 7:51 pm
Am I the only person in the world that believes that the purpose of software metrics should be to make it possible to understand software science rather than to predict or determine programmer peformance? If there are others, I would sure like to hear from them.