Thursday, April 10, 2008

Defect Metrics

I had a friend ask me recently about bug metrics, how they're used, etc. I am breaking a personal rule about editing and just shoving my e-mail comments to Brian into my weblog - I generally take more care. But I noticed it's been almost a month since I've posted. I've been so busy (as my mother-in-law used to say, up to my ears in alligators), but that's no excuse!

These articles are all going to end up in a book sometime soon. I'm currently looking for a publisher, and working on an outline. So I'd love your feedback on this stuff.

Defect Metrics

The generally accepted standard is to apply severity and priority. Typically severity 1, 2, 3 and priority 1, 2, 3. NOTE a high severity or high priority bug is actually a LOW value, so a sev 1 is a really critical bug, a pri 3 is unimportant. Severity generally relates to the impact the bug has on a customer. Sev 1 is generally data loss, blocked functionality, etc., sev 2 is generally major annoyance but with a workaround, and sev 3 is a UI, pixel-pushing bug. Generally. Priority relates to the importance of a bug—for instance, fixing a data loss issue is a pri 1, fixing a pixel-pushing bug is a pri 3.

At the Church, we’re currently chatting about changing that a bit. We have a guy who proposed something very similar to what QA Associates (now part of Tek Systems – uck) proposed in an old white paper. That’s the concept of a matrix. At the Church, we are talking about matrixing severity (the impact of a defect) and frequency/exposure (how likely/often a customer is to encounter the bug). Based on that matrix we get a priority. All P1s have to be fixed immediately because either 1) they block further testing or 2) they at least mean we’ll continue to code on a sandy foundation. It’s a unique approach – the objective is to reduce time in bug triage/scrubbing to about 0, and to improve the overall quality because 1) our bug priorities are more granular, 2) the emotion of which bug to fix has been eliminated, and 3) we hold ourselves more to a bar (P2 and up all have to be fixed, for example). We haven’t presented this to management yet—hoping to wrap it up next week and present shortly thereafter.

What metrics are important? Sheesh, could probably write an entire book on that. I really geek out on metrics, actually – something which surprised the teams I work with. They don’t know what to do with me. Here goes!

  • Income rate: how many bugs are being opened per day (or per build, but per day is easier to measure). Watch that count increase as development begins and test becomes more familiar with the new features. Toward the end of the project, that number had better drop—in project IT, it tends to drop off quickly, in product software the tail is much more shallow. Some of the drop-off is due to being pushed to other projects, some of it reflects hitting the ‘acceptable’ quality bar (which is always lower in project IT than product software) and some actually reflects that the bugs are pretty well shaken out.
  • Resolved rate: always good to keep an eye on this – is the resolved (but not closed) rate creeping up? Tells you test is not keeping up. Next, you have to ask yourself why… Is test lazy? Are they so busy ‘working’ that they can’t test? Did they have a huge spike of bugs found the previous week and are they just not able to chew through regression as quickly as dev is fixing?
  • Bugs opened per day, by severity – so interesting to watch! See how many S1 bugs come in from start to finish, esp compared to S2 or S3. In the early phases, most of your bugs had better be S1’s. This is the core architecture, and your testers had better be focused on rooting out core issues. As the income rate starts to drop, you should see P1s dropping and P2/P3 picking up. That shows you’ve stabilized core components and bugs are either coming in from peripheral components OR they are just niggling fit ‘n finish bugs.
  • Same for priority (bugs opened per day, by priority)
  • Pie chart: severity: always interesting to see how many bugs are S1, S2, and S3. If I am in the last week of testing, and we have 70% S1 bug count, I *know* we are not done testing. We recently did a study and found 50% of the bugs in all of the Church’s databases are S1. That tells me 1) we write really lousy code, or 2) our testers mark S1 bugs incorrectly or 3) we stop testing before the S2 and S3 bugs are found. A lot of it is the second issue… The project teams here all set S1 as their bar; S2 bugs rarely get fixed. It’s part of people wanting to ship fast. In order to get a bug fixed, therefore, testers have to artificially inflate the bug’s severity/priority. A healthy project will be around 40%, 40%, 20% or maybe even 30%, 30%, 30% distribution. If not, dig deeper and find out why.
  • Pie chart: priority: pretty much the same. It’s interesting to see the distribution of bugs and how they shake out in priority.
  • Bugs opened per area: what’s your buggiest feature set/area in a given project? As a lead/manager you’ll want to add some test focus there. As a manager, you’ll start harping on your dev manager to figure it out.
  • Bugs opened per developer: this isn’t really fair all the time. Some times developers get saddled with really bad legacy code, or they get very complex interface features. But still, looking at the bugs per developer is interesting.
  • Bugs opened per tester: hey, you can measure bugs and that’s a good thing! A lot of people are sensitive to comparing testers (see the reasons above), but still – the best testers are the ones who find bugs and influence to the point where they are fixed. After all, we pay testers to find bugs right? Well, actually that’s wrong. We *should* be paying testers to prevent defects by helping development NOT check in defective code. But barring that, finding them is good too.
  • Pie chart: resolution: take a look at your resolutions (fixed, won’t fix, duplication, by design, etc.) and see how your bugs lay out. I was SHOCKED to see we have had a 90% fix rate in a current project, but a lot of that is because we’re still finding P1s. Generally 75% to 80% fix rate is normal—you don’t want to fix them all! If you have time to fix every bug, your schedule is way over-estimated. Plus each fix represents a potential of one, two, or even three MORE bugs introduced. So keeping the fixes down is a good thing.

I'll get some charts on these metrics and post them up sooner or later. Meanwhile keep the comments coming.