James Shore: AoAD2 Practice: No Bugs

AoAD2 Practice: No Bugs

Book cover for “The Art of Agile Development, Second Edition.” by James Shore.

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

No Bugs

Audience: Whole Team

We release with confidence.

If you’re on a team with a bug count in the hundreds or thousands, the idea of “no bugs” probably sounds ridiculous. I’ll admit it: no bugs is an ideal to strive for, not something your team will completely achieve. There will always be some bugs. (Or defects; I use “bug” and “defect” interchangeably.)

But you can get closer to the “no bugs” ideal than you might think. Consider Nancy van Schooenderwoert’s experience with Extreme Programming. She led a team of novices working on a real-time embedded system for farm combines: a concurrent system written in C, with some assembly. If that’s not a recipe for bugs, I don’t know what is. According to her analysis of data by Capers Jones, the average team developing this software would produce 1,035 defects and deliver 207 to the customer.

Here’s what actually happened:

The GMS team delivered this product after three years of development, having encountered a total of 51 defects during that time. The open bug list never had more than two items at a time. Productivity was measured at almost three times the level for comparable embedded software teams. The first field test units were delivered after approximately six months into development. After that point, the software team supported the other engineering disciplines while continuing to do software enhancements. [VanSchooenderwoert2006]

“Embedded Agile Project by the Numbers with Newbies”

Over three years, the team generated 51 defects and delivered 21 to its customer. That’s a 95% reduction in generated defects and a 90% reduction in delivered defects.

We don’t have to rely on self-reported data. QSM Associates is a well-regarded company that performs independent audits of software development teams. In an early analysis of a company practicing a variant of XP, they reported an average reduction from 2,270 defects to 381 defects, an 83% decrease. Furthermore, the XP teams delivered 24% faster with 39% fewer staff. [Mah2006]

More recent case studies confirmed those findings. QSM found 11% defect reduction and 58% schedule reduction on a Scrum team; 75% defect reduction and 53% schedule reduction on an XP team; and 75% defect reduction and 30% schedule reduction in a multiteam analysis of thousands of developers. [Mah2018]

Eliminate errors at their source rather than finding and fixing them after the fact.

How do you achieve these results? It’s a matter of building quality in, rather than testing defects out. Eliminate errors at their source rather than finding and fixing them after the fact.

Key Idea: Build Quality In

“Cheap, fast, or good,” the saying goes. “Pick two.”

For decades, people have thought of quality as something that costs extra. The more time and money you spend, the higher quality you can get. And, to a degree, it’s true. If your approach to quality is to test work after it’s finished, the more time you spend testing and fixing, the more bugs you eliminate.

But that isn’t the only road to quality. By building quality in from the beginning, you get higher quality results for less cost and time. Building quality in takes time, sure, but it also eliminates time needed for testing and fixing after the fact. It turns out to be a net gain.

“Cheap, fast, or good.” You can have all three.

Don’t Play the Bug Blame Game

Is it a bug or a feature?

I’ve seen companies waste inordinate amounts of time on this question. In an attempt to apportion blame “correctly,” they make elaborate distinctions between bugs, defects, errors, issues, anomalies, and, of course...unintentional features.

What really matters is whether you will do or not do something.

None of that matters. What really matters is whether you will do or not do something. If there’s something your team needs to do—whatever the reason—it’s a story in your plan.

For the purposes of this chapter, I’m defining bugs as follows:

A bug is anything your team considers “done” that later needs correction.

For your purposes, though, even that distinction doesn’t matter. If something needs work, it gets a story card. That’s all there is to it.

How to Build Quality In

The better your internal quality, the faster you go.

Before I describe how to build quality in, I need to clarify what I mean by “quality.” Roughly speaking, quality can be divided into “internal quality” and “external quality.” Internal quality is the way your software is constructed. It’s things like good names, clear software design, and simple architecture. Internal quality controls how easy your software is to extend, maintain, and modify. The better the internal quality, the faster you go.

External quality is the user-visible aspects of your software. It’s your software’s UX, functionality, and reliability. You can spend infinite amounts of time on these things. The right amount of time depends on your software, market, and value. Figuring out the balance is a question of product management.

“Building quality in” means keeping internal quality as high as possible while keeping external quality at the level needed to satisfy your stakeholders. That involves keeping your design clean, delivering the stories in your plan, and revising your plan when your external quality falls short of what’s needed.

Now, let’s talk about how to do it. To build quality in and achieve zero bugs, you’ll need to prevent four types of errors.

Prevent programmer errors

Programmer errors occur when a programmer knows what to program, but makes a mistake. It could be an incorrect algorithm, a typo, or some other mistake made while translating ideas to code.

Allies: Test-Driven Development; Energized Work; Pair Programming; Mob Programming; Alignment; Done Done

Test-driven development is your defect-elimination workhorse. Not only does it ensure that you program what you intended to, it also gives you a comprehensive regression suite you can use to detect future errors.

To enhance the benefits of test-driven development, work sensible hours and use pair programming or mobbing to bring multiple perspectives to bear on every line of code. This improves your brainpower, which helps you make fewer mistakes and allows you to see mistakes more quickly.

Supplement these practices with good standards (which are part of your alignment discussion) and a “done done” checklist. These will help you remember and avoid common mistakes.

Prevent design errors

Design errors create breeding grounds for bugs. According to Barry Boehm, 20% of the modules in a program are typically responsible for 80% of the defects. [Boehm1987] It’s an old statistic, but it matches my experience with modern software, too.

Even with test-driven development, design errors will accumulate over time. Sometimes a design that looks good when you first create it won’t hold up over time. Sometimes a shortcut that seems like an acceptable compromise will come back to bite you. Sometimes your requirements change and your design no longer fits.

Allies: Collective Code Ownership; Simple Design; Incremental Design; Reflective Design; Slack

Whatever the cause, design errors manifest as complicated, confusing code that’s hard to get right. Although you could take a week or two off to fix these problems, it’s better to continuously improve your internal quality.

Use collective code ownership to give programmers the right and responsibility to fix problems wherever they live. Use evolutionary design to continuously improve your design. Make time for improvements by including slack in your plans.

Prevent requirements errors

Requirements errors occur when a programmer creates code that does exactly what they intended it to do, but their intention was wrong. Perhaps they misunderstood what they were supposed to do, or perhaps nobody really understood what needed to be done. Either way, the code works, but it doesn’t do the right thing.

Allies: Whole Team; Purpose; Context; Team Room; Ubiquitous Language; Customer Examples; Incremental Requirements; Stakeholder Demos; Stories; Done Done

A cross-functional, whole team is essential for preventing requirements errors. Your team needs to include on-site customers with the skills to understand, decide, and explain the software’s requirements. Clarifying the team’s purpose and context is vital to this process.

A shared team room is also important. When programmers have a question about requirements, they need to be able to turn their head and ask. Use a ubiquitous language to help programmers and on-site customers understand one another, and supplement your conversations with customer examples.

Confirm that the software does what it needs to do with frequent customer reviews and stakeholder demos. Perform those reviews incrementally, as soon as programmers have something to show, so misunderstandings and refinements are discovered early, in time to be corrected. Use stories to focus the team on customers’ perspective. Finally, don’t consider a story “done done” until on-site customers agree it’s done.

Prevent systemic errors

If everyone does their job perfectly, these practices yield software with no defects. Unfortunately, perfection is impossible. Your team is sure to have blind spots: subtle areas where team members make mistakes, but they don’t know it. These blind spots lead to repeated, systemic errors. They’re “systemic” because they’re a consequence of your entire development system: your team, its process, the tools you use, the environment you work in, and more.

Escaped defects are a clear signal of problems in paradise. Although errors are inevitable, most are caught quickly. Defects found by end users have “escaped.” Every escaped defect indicates a need to improve your development system.

Ally: Blind Spot Discovery

Of course, you don’t want your end users to be your beta testers. That’s where blind spot discovery comes in. It’s a variety of techniques, such as chaos engineering and exploratory testing, for finding gaps in your understanding. I discuss them in the next practice.

Some teams use these techniques to check the quality of their software system: they’ll code a story, search for bugs, fix them, and repeat. But to build quality in, treat your blind spots as a clue about how to improve your development system, not just your software system. The same goes for escaped defects. They’re all clues about what to improve.

Ally: Incident Analysis

Incident analysis helps you decipher those clues. No matter the impact, if your team thought something was done and it later needs fixing, it can benefit from incident analysis. This applies to well-meaning mistakes, too: if everybody thinks a particular new feature is a great idea, and it turns out to enrage your customers, it deserves just as much analysis as a production outage.

When you find a bug, write a test and fix the bug, but then fix the underlying system. Even if it’s just in the privacy of your thoughts, think about how you can improve your design and process to prevent that type of bug from happening again.

Fix Bugs Immediately

Do. Or do not. There is no //TODO.

As the great Master Yoda never said, “Do. Or do not. There is no //TODO.”

Each defect is the result of a flaw that’s likely to breed more mistakes. Improve quality and productivity by fixing them right away.

Allies: Collective Code Ownership; Team Room

Fixing bugs quickly requires the whole team to participate. Programmers, use collective code ownership so anyone can fix each bug. Customers and testers, personally bring new bugs to the attention of a programmer and help them reproduce it. This is easiest when the team shares a team room.

In practice, it’s not possible to fix every bug right away. You may be in the middle of something else when you learn about a bug. When this happens to me, I ask my navigator to make a note. We come back to it 10–20 minutes later, when we come to a good stopping point.

Allies: Slack; Task Planning; Visual Planning

Some bugs are too big to fix quickly. For these, I gather the team for a quick huddle. We collectively decide if we have enough slack to fix the bug and still meet our other commitments. If we do, we create tasks for the bug, put them in our plan, and people volunteer for them as normal. (If you’re using estimates, these tasks don’t get estimates or count toward your capacity.)

If there isn’t enough slack to fix the bug, decide as a team whether it’s important enough to fix before your next release. If it is, create a story for it and schedule it immediately for your next iteration or story slot. If it isn’t, add it to your visual plan in the appropriate release.

Bugs that aren’t important enough to fix should be discarded. If you can’t do that, the bug needs to be fixed. The “fix,” though, can be a matter of documenting a workaround, or making a record that you decided not to fix the bug. An issue tracker might be the right way to do this.

Testers’ Role

Because fluent Delivering teams build quality in, rather than testing defects out, people with testing skills shift left. Instead of focusing their skills on the completed product, they focus on helping the team build a quality product from the beginning.

In my experience, some testers are business-oriented: they’re very interested in getting business requirements right. They work with on-site customers to uncover all the nit-picky details the customers would otherwise miss. They’ll often prompt people to think about edge cases during requirements discussions.

Other testers are more technically-oriented. They’re interested in test automation and nonfunctional requirements. These testers act as technical investigators for the team. They create the testbeds that look at issues such as scalability, reliability, and performance. They review logs to understand how the software system works in production. Through these efforts, they help the team understand the behavior of its software and decide when to devote more effort to operations, security, and nonfunctional stories.

Ally: Blind Spot Discovery

Testers also help the team identify blind spots. Although anybody on the team can use blind spot discovery techniques, people with testing skills tend to be particularly good at it.

’Tude

Bugs are for other people.

I encourage an attitude among my teams—a bit of eliteness, even snobbiness. It goes like this: “Bugs are for other people.”

If you do everything I’ve described, bugs should be a rarity. Your next step is to treat them that way. Rather than shrugging your shoulders when a bug occurs—“Oh yeah, another bug, that’s what happens in software”—be shocked and dismayed. Bugs aren’t something to be tolerated; they’re a sign of underlying problems to be solved.

Allies: Pair Programming; Mob Programming; Team Room; Collective Code Ownership

Ultimately, “no bugs” is about establishing a culture of excellence. When you learn about a bug, fix it right away, then figure out how to prevent that type of bug from happening again.

You won’t be able to get there overnight. All the practices I’ve described take discipline and rigor. They’re not necessarily difficult, but they break down if people are sloppy or don’t care about their work. A culture of “no bugs” helps the team maintain the discipline required, as do pairing or mobbing, a team room, and collective ownership.

You’ll get there eventually. Agile teams can and do achieve nearly zero bugs. You can too.

Questions

How do we prevent security defects and other challenging bugs?

Allies: Done Done; Alignment; Blind Spot Discovery

Threat modeling (see the “Threat Modeling” section) can help you think of security flaws in advance. Your “done done” checklist and coding standards can remind you of issues to address. That said, you can only prevent bugs you think to prevent. Security, concurrency, and other difficult problem domains may introduce defects you never considered. That’s why blind spot discovery is also important.

How should we track our bugs?

You shouldn’t need a bug database or issue tracker for new bugs, assuming your team isn’t generating a lot of bugs. (If you are, focus on solving that problem first.) If a bug is too big to fix right away, turn it into a story, and track its details in the same way you handle other requirements details.

How long should we work on a bug before we turn it into a story?

Ally: Slack

It depends on how much slack you have. Early in an iteration, when there’s still a lot of slack, I might spend half a day on a defect before turning it into a story. Later, when there’s less slack, I might only spend 10 minutes on it.

We have a lot of legacy code. How can we adopt a “no bugs” policy without going mad?

It will take time. Start by going through your bug database and identifying the ones you want to fix in the current release. Schedule at least one to be fixed every week, with a bias toward fixing them sooner rather than later.

Ally: Incident Analysis

Every week or two, randomly choose a recent bug to subject to an incident analysis, or at least an informal one. This will allow you to gradually improve your development system and prevent bugs in the future.

Prerequisites

“No bugs” is about a culture of excellence. It can only come from within the team. Managers, don’t ask your teams to report defect counts, and don’t reward or punish them based on the number of defects they have. You’ll just drive the bugs underground, and that will make quality worse, not better. I’ll discuss this further in the “Incident Accountability” section.

Achieving the “no bugs” ideal depends on a huge number of Agile practices—essentially, every Focusing and Delivering practice in this book. Until your team reaches fluency in those practices, don’t expect dramatic reductions in defects.

Conversely, if you’re using the Focusing and Delivering practices, more than a few new bugs per month may indicate a problem with your approach. You’ll need time to learn the practices and refine your process, of course, but you should see an improvement in your bug rates within a few months. If you don’t, check the “Troubleshooting Guide” sidebar.

Indicators

When your team has a culture of “no bugs”:

Your team is confident in the quality of its software.
You’re comfortable releasing to production without a manual testing phase.
Stakeholders, customers, and users rarely encounter unpleasant surprises.
Your team spends its time producing great software instead of fighting fires.

Alternatives and Experiments

One of the revolutionary ideas Agile incorporates is that low-defect software can be cheaper to produce than high-defect software. This is made possible by building quality in. To experiment further, look at the parts of your process that check quality at the end, and think of ways to build that quality in from the beginning.

You can also reduce bugs by using more and higher quality testing to find and fix a higher percentage of bugs. However, this doesn’t work as well as building quality in from the beginning. It will also slow you down and make releases more difficult.

Some companies invest in separate QA teams in an effort to improve quality. Although occasional independent testing can be useful for discovering blind spots, a dedicated QA team isn’t a good idea. Paradoxically, it tends to reduce quality, because the development team then spends less effort on quality themselves. Elisabeth Hendrickson explores this phenomenon in her excellent article, “Better Testing, Worse Quality?” [Hendrickson2000]

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.