AoAD2 Practice: Blind Spot Discovery

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Blind Spot Discovery

Audience
Testers, Whole Team

We discover the gaps in our thinking.

Fluent Delivering teams are very good at building quality into their code, as you saw in the previous practice. But nobody’s perfect, and teams have blind spots. Blind spot discovery is a way of finding those gaps.

To find blind spots, look at the assumptions your team makes about what they’re building and how they build it. Look at the pressures and constraints the team is under, too. Imagine what risks the team might be facing, and what they might falsely believe to be true, then investigate to see if your guess is right. Testers tend to be particularly good at this.

Allies
No Bugs
Incident Analysis

When you find a blind spot, don’t just fix the problem you found. Instead, fix the gap. Think about how your approach to development allowed the bug to occur, then change your approach to prevent that entire category of bugs from happening again. This is an important part of the “no bugs” attitude found on the best Agile teams. An incident analysis session will help you figure out what to change.

Here are five of the most popular techniques for finding blind spots:

Validated Learning

When people think about bugs, they often think about logic errors, user interface errors, or production outages. But the blind spot I see most often is more fundamental, and more subtle.

More than anything else, teams build the wrong thing.

More than anything else, teams build the wrong thing. To use Lean Startup terminology, they lack product-market fit. I think this happens because so many teams think of their job as building the product they were told to build. They act as obedient order-takers: a software factory designed to ingest stories in one end and plop software out the other.

In his foundational book, The Lean Startup, Eric Ries describes the purpose of a startup:

Startups exist not just to make stuff, make money, or even serve customers. They exist to learn how to build a sustainable business. This learning can be validated scientifically by running frequent experiments that allow entrepreneurs to test each element of their vision.

...The fundamental actvitiy of a startup is to turn ideas into products, measure how customers respond, and then learn whether to pivot or persevere. All successful startup processes should be geared to accelerate that feedback loop. [Ries 2011] (p. 9, emphasis his)

The Lean Startup

Replace “startups” with “your team” and “sustainable business” with “sustainable product,” and that quote applies equally well to the work of your team.

Nobody really knows what you should build, not even the people asking for it.

Don’t just assume that your team should build what it’s told to build. Instead, assume the opposite: nobody really knows what you should build, not even the people asking for it. Your team’s job is to take those ideas, test them, and learn what you should really build. Here’s Eric Ries again:

I’ve come to believe that learning is the essential unit of progress for startups. The effort that is not absolutely necessary for learning what customers want can be eliminated. I call this validated learning because it is always demonstrated by positive improvements in the startup’s core metrics. As we’ve seen, it’s easy to kid yourself about what you think customers want. It’s also easy to learn things that are completely irrelevant. Thus, validated learning is backed up by empirical data collected from real customers. (pp. 49-50, emphasis his)

Allies
Purpose
Visual Planning
Real Customer Involvement
Incremental Requirements

For many teams, the first time they test their ideas is when they release their software. That’s pretty risky. Instead, use Ries’ Build-Measure-Learn loop:

  1. Build. Look at your team’s purpose and plan. What core assumptions are you making about your product, customers, and users? Choose one to test, then think, “What’s the smallest thing we can put in front of real customers and users?” It doesn’t have to be a real product—in some cases, a mock-up or paper prototype will work—and you don’t have to involve every user, but you do need to involve people who will actually buy or use your product.

  2. Measure. Prior to showing people what you’ve built, decide what data you need to see in order to say that the assumption has been proven or disproven. The data can be subjective, but the measurement should be objective. For example, “70% of our customers say they like us” is an objective measurement of subjective data.

  3. Learn. Your measurement will either validate your hypothesis or disprove it. If you validated the hypothesis, continue with the next one. If you disproved your hypothesis, change your plans accordingly.

For example, one team’s purpose was to improve surgical spine care outcomes. They planned to do so by building a tool to give clinical leads a variety of views into surgical data. One of their core assumptions was that the clinical leads would actually trust the underlying data the tool would use. But the data could be poor, and the leads tended to be skeptical.

To test their assumption, the team decided to: (build) use real data from seven clinics to create a mock-up of the tool; (measure) show it to those seven clinics’ leads; (learn) if at least five said the data was of acceptable quality, the assumption would be validated. If not, they would come up with a new plan.

Validated learning is one of the hallmarks of an Optimizing team.

Validated learning is one of the hallmarks of an Optimizing team. Depending on your organizational structure, you may not be able to use it to its fullest. (They may want your team to be a software factory.) Still, the fundamental idea applies. Don’t just assume your stories will make people happy. Do everything you can to check your assumptions and get feedback.

For more about validated learning, see [Ries 2011].

Exploratory Testing

Ally
Test-Driven Development

Test-driven development ensures that programmers’ code does what they intended it to do, but what if the programmer’s intention is wrong? For example, a programmer might think the correct way to determine the length of a string in JavaScript is to use string.length, but that can result in counting six letters in the word “naïve.”1

1The count can be off because string.length reports the number of codepoints, not the number of graphemes—what people usually think of as characters—and it’s possible for Unicode to store the grapheme “ï” as two codepoints: a normal “i” plus a “combining diaeresis” (the umlaut). String manipulation has similar issues. Reversing a string containing the Spanish flag will convert Spain 🇪🇸 to Sweden 🇸🇪, which is sure to surprise beach-goers.

Exploratory testing is a technique for finding these blind spots. It’s a rigorous approach to testing which involves “designing and executing tiny experiments in rapid succession using the results from the last experiment to inform the next.” [Hendrickson 2013] (ch. 1) It involves these steps:

  1. Charter. Start by deciding what you’re going to explore, and why. A new technology the team recently adopted? A recently-released user interface? A critical piece of security infrastructure? Your charter should be general enough to give you an hour or two of work, and specific enough to help you focus.

  2. Observe. Use the software. You’ll often do so via the user interface, but you can also use tools to explore APIs and network traffic, and you can also observe hidden parts of the system, such as logs and databases. Look for two things: anything that’s out of the ordinary, and anything you can modify, such as a URL, form field, or file upload, that might lead to unexpected behavior. Take notes as you go, so you can retrace your steps when necessary.

  3. Vary. Don’t just use the software normally; push its boundaries. Put an emoji in a text field. Enter a size as zero or negative. Upload a zero-byte file, a corrupted file, or an “exploding” zip file that expands to terabytes of data. Modify URLs. Corrupt network traffic. Artificially slow down your network, or write to a file system with no free space.

As you go, use your observations and your understanding of the system to decide what to explore next. You’re welcome to supplement those insights by looking at code and production logs. If you’re exploring security capabilities, you can use your team’s threat model as a source of inspiration, or create your own. (See “Threat Modeling” on p.XX.)

There’s much more to exploratory testing than I have room for in this book. For more detail, and a great set of heuristics about what to vary, see [Hendrickson 2013].

Chaos Engineering

In a large networked systems, failures are an everyday occurrence. Your code must be programmed to be resilient to those failures, and that requires careful attention to error handling and resilience. Unfortunately, error handling is a common blind spot for less experienced programmers and teams, and even experienced teams can’t predict every failure mode of a complex system.

Chaos engineering can be considered a specialized form of exploratory testing which focuses on system architecture.2 It involves deliberately injecting failures into running systems—often, live production systems—in order to learn how they respond to failure. Although this may seem risky, it can be done in a controlled way that minimizes risk. It allows you to identify issues that only appear as a result of complex interactions.

2Some people in the chaos engineering community object to use of the word “testing” in relationship to chaos engineering. They prefer the term “experiment.” I think that objection misunderstands the nature of testing. As Elisabeth Hendrickson writes in Explore It!: “This is the essence of testing: designing an experiment to gather empirical evidence to answer a question about a risk.” That’s exactly what chaos engineering is, too. [Hendrickson 2013] (ch. 1)

Chaos engineering is similar to exploratory testing in that it involves finding opportunities to vary normal behavior. Rather than thinking in terms of unexpected user input and API calls, though, you think in terms of unexpected system behavior: nodes crashing, high latency network links, unusual responses, and so forth. Fundamentally, it’s about conducting experiments to determine if your software system is as resilient as you think it is.

  1. Start with an understanding of your system’s “steady state.” What does your system look like when it’s functioning normally? What assumptions does your team or organization make about your system’s resiliency? Which of those would be most valuable to check first? When you perform the experiment, how will you know if it succeeded or failed?

  2. Prepare to vary the system in some way: remove a node, introduce latency, change network traffic, artificially increase demand, etc. (If this is your first test, start small, so the impact of failure is limited.) Form a hypothesis about what will happen. Make a plan for aborting the experiment if things go badly wrong.

  3. Make the change and observe what happens. Was your hypothesis correct? Is the system still performing adequately? If not, you’ve identified a blind spot. Either way, discuss the results with your team and improve your collective mental model of the system. Use what you’ve learned to decide which experiment you should conduct next.

Many of the stories surrounding chaos engineering involve automated tools, such as Netflix’s Chaos Monkey. To use chaos engineering within your team, though, don’t focus on building tools. It’s more valuable to conduct a breadth of experiments than to automatically repeat a single experiment. You’ll need to build some basic tooling to support your work, and that tooling will grow in sophistication over time, but try to conduct the broadest set of experiments you can for the least amount of work.

The principles of chaos engineering can be found at principlesofchaos.org. For a book-length treatment of the topic, see [Rosenthal and Jones 2020].

Penetration Testing and Vulnerability Assessments

Although exploratory testing can find some security-related blind spots, security-sensitive software warrants testing by experts.

Penetration testing, also known as pentesting, involves having people attempt to defeat the security of your system in the same way a real attacker would. It can involve probing the software your team writes, but it also considers security more holistically. Depending on the rules of engagement you establish, it can involve probing your production infrastructure, your deployment pipeline, human judgment, and even physical security such as locks and doors.

Penetration testing requires specialized expertise. You’ll typically need to a hire an outside firm. It’s expensive, and your results depend heavily on the skill of the testers. Exercise extra diligence when hiring a penetration testing firm, and remember that the individuals performing the test impact your results as much as, or even more than, the firm you choose.

Vulnerability assessments are a less-costly alternative to penetration testing. Although penetration testing is technically a type of vulnerability assessment, most firms advertising “vulnerability assessments” perform an automated scan.

Automated assessments can be useful for finding straightforward blind spots, but the results usually include a lot of false positives. You’re likely to need an expert to help you triage the results. Once you address the blind spots identified by the first few assessments, additional assessments tend to have more noise than value, although they’re still worth repeating occasionally.

In general, start by using threat modelling (see “Threat Modelling” on p.XX) and security checklists, such as the OWASP Top 10 at owasp.org, to inform your programming and exploratory testing efforts. Supplement those checklists with automated vulnerability assessments to find additional blind spots. Then turn to penetration testing for an in-depth assessment.

Mutation Testing

In theory, test-driven development ensures that programmers code does it exactly what it’s supposed to do. But what if your team doesn’t use TDD, or doesn’t use it properly? Mutation testing can help you evaluate your test quality.

Mutation testing involves introducing an error into your codebase, running the tests, and seeing if the tests fail. An automated tool repeats this process thousands of times, then reports which changes slipped by the tests.

Ally
Test-Driven Development

Like vulnerability assessments, mutation testing tends to have a lot of false positives. Don’t try to achieve a perfect score. Instead, focus on identifying the blind spots mutation testing reveals. For each missing test it identifies, think about whether those tests were really needed, and—if they were—why they weren’t created in the first place.

Mutation testing is the weakest of the techniques I’ve described. If you’re confident in your approach to test-driven development, you don’t need it. But you aren’t yet confident, or if you don’t use TDD, it can be worth a try.

There are a variety of tools for mutation testing. A web search for “<language> mutation testing” will help you find one for your programming language.

Questions

Should these techniques be performed individually, in pairs, or as a mob?

Allies
Pair Programming
Mob Programming

It’s up to your team. It’s fine to perform these techniques individually. On the other hand, pairing and mobbing are good for coming up with ideas and disseminating insights, and they can help break down the barriers that tend to form between testers and other team members. Experiment to see which approach work best for your team. It might vary by technique.

Won’t the burden of blind spot discovery keep getting bigger as the software gets bigger?

It shouldn’t. Blind spot discovery isn’t like traditional testing, which tends to grow along with the codebase. It’s for checking assumptions, not validating an ever-increasing codebase. As the team addresses blind spots and gains confidence in its ability to deliver high-quality results, the need for blind spot discovery should go down, not up.

Prerequisites

You don’t need to check before you release.

Any team can use these techniques. But remember that they’re for discovering blind spots, not checking that the software works. Don’t let them be a bottleneck. You don’t need to check before you release, and you don’t need to check everything. Remember, you’re looking for flaws in your approach, not your software.

Ally
No Bugs

On the other hand, releasing without additional checks requires your team to be able to produce code with nearly no bugs. If you aren’t there yet, or if you just aren’t ready to trust your approach, it’s okay to delay releasing until you’ve checked for blind spots. Just be sure not to use blind spot discovery as a crutch. Fix your approach so you can release without manual testing.

Indicators

When you use blind spot discovery well:

  • The team trusts the quality of their software.

  • The team doesn’t use blind spot discovery as a form of pre-release testing.

  • The number of defects found in production and by blind-spot techniques declines over time.

  • The amount of time needed for blind spot discovery declines over time.

Alternatives and Experiments

Allies
No Bugs
Test-Driven Development

This practice is based on an assumption that it’s possible for developers to build systems with nearly no bugs—that defects are the result of fixable blind spots, not a lack of manual testing. So the techniques are geared around finding surprises and testing hypotheses.

The most common alternative is traditional testing: building repeatable test plans that comprehensively validate the system. Although this may seem more reliable, those test plans have blind spots of their own. Most of the tests end up being redundant to the tests programmers create with test-driven development. At best, they tend to find the same sorts of issues that exploratory testing does, at much higher cost, and they rarely expose problems that the other techniques reveal.

In terms of experimentation, the techniques I’ve described are just the beginning. The underlying idea is to validate your hidden assumptions. Anything you can do to identify and test those assumptions is fair game.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.