FAST: A Better Way to Scale

I gave a talk on FAST (Fluid Scaling Technology) to Agile New England in July. The talk was recorded and you can access it here. You’ll have to create an account, but it’s free.

Here’s the abstract:

How can multiple teams work together on a single product? The common wisdom is to carefully align teams and responsibilities to create autonomous teams. But, invariably, this approach eventually runs into cross-team bottlenecks, challenges aligning responsibilities to teams, and difficulties creating cross-functional teams.

Fluid Scaling Technology, or FAST, is an innovative new approach that solves these problems. It uses frequent team self-selection to prevent bottlenecks, share knowledge amongst teams, and ensure the right people are working on the right things. It’s simple and lightweight.

Join James Shore as he shares his experiences with scaling Agile—first with traditional approaches, and more recently with FAST. Learn what works, what doesn’t, and how you can try FAST in your organization.

Watch it here.

Aug
19
2022
Art of Agile Development Book Club: Incident Analysis

Fridays from 8:00 – 8:45am Pacific, I host a call-in talk show inspired by the new edition of my book, The Art of Agile Development. Each session uses a chapter from the book as a jumping-off point for a wide-ranging discussion about Agile ideas and practices.

Attendance is free! No sign-up needed.

To learn more about The Art of Agile Development, see the book home page. You can buy a copy from Amazon or your favorite purveyor of software development books.

August 5th & 12th: (no session)

We’re on a brief hiatus in the first half of August.

August 19th: Incident Analysis

Despite your best efforts, your software will sometimes fail to work as it should. Some failures will be minor; others will be more significant. Either way, once the dust has settled, you need to figure out what happened and how you can improve. This is incident analysis.

WhenAugust 19th, 8-8:45am Pacific (calendar invite)
Where🎙 Zoom link
Reading 📖 Incident Analysis
Discussion🧑‍💻 Discord invite

August 26th: Optimizing

We’re wrapping up the book club on August 26th with a discussion of the Optimizing fluency zone. Mary and Tom Poppendieck will be joining us. Add it to your calendar!

Session Recordings

Note: The Art of Agile Development Book Club sessions are recorded. By appearing on the show, you consent to be recorded and for your appearance to be edited, broadcast, and distributed in any format and for any purpose without limitation, including promotional purposes. You agree Titanium I.T. LLC owns the copyright to the entire recording, including your contribution, and has no financial obligations to you as the result of your appearance. You acknowledge that your attendance at the book club is reasonable and fair consideration for this agreement.

If you don’t want to be recorded, that’s fine—just keep your camera and microphone muted. You’re still welcome to attend!

AoAD2 Practice: Incident Analysis

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

📖 The full text of this section is available below, courtesy of the Art of Agile Development book club! Join us on Fridays from 8-8:45am Pacific for wide-ranging discussions about Agile. Details here.

Incident Analysis

Audience
Whole Team

We learn from failure.

Despite your best efforts, your software will sometimes fail to work as it should. Some failures will be minor, such as a typo on a web page. Others will be more significant, such as code that corrupts customer data, or an outage that prevents customer access.

Some failures are called bugs or defects; others are called incidents. The distinction isn’t particularly important. Either way, once the dust has settled and things are running smoothly again, you need to figure out what happened and how you can improve. This is incident analysis.

The details of how to respond during an incident are out of the scope of this book. For an excellent and practical guide to incident response, see Site Reliability Engineering: How Google Runs Production Systems [Beyer2016], particularly Chapters 12–14.

The Nature of Failure

Failure is a consequence of your entire development system.

It’s tempting to think of failure as a simple sequence of cause and effect—A did this, which led to B, which led to C—but that’s not what really happens.1 In reality, failure is a consequence of the entire development system in which work is performed. (Your development system is every aspect of how you build software, from tools to organizational structure. It’s in contrast to your software system, which is the thing you’re building.) Each failure, no matter how minor, is a clue about the nature and weaknesses of that development system.

1My discussion of the nature of failure is based on [Woods2010] and [Dekker2014].

Failure is the result of many interacting events. Small problems are constantly occurring, but the system has norms that keep them inside a safe boundary. A programmer makes an off-by-one error, but their pairing partner suggests a test to catch it. An on-site customer explains a story poorly, but notices the misunderstanding during customer review. A team member accidentally erases a file, but continuous integration rejects the commit.

When failure occurs, it’s not because of a single cause, but because multiple things go wrong at once. A programmer makes an off-by-one error, and their pairing partner was up late with a newborn and doesn’t notice, and the team is experimenting with less frequent pair swaps, and the canary server alerts were accidentally disabled. Failure happens, not because of problems, but because the development system—people, processes, and business environment—allows problems to combine.

Furthermore, systems exhibit a drift toward failure. Ironically, for teams with a track record of containing failures, the threat isn’t mistakes, but success. Over time, as no failures occur, the team’s norms change. For example, they might make pairing optional so people have more choice in their work styles. Their safe boundaries shrink. Eventually, the failure conditions—which existed all along!—combine in just the right way to exceed these smaller boundaries, and a failure occurs.

It’s hard to see the drift toward failure. Each change is small, and is an improvement in some other dimension, such as speed, cost, convenience, or customer satisfaction. To prevent drift, you have to stay vigilant. Past success doesn’t guarantee future success.

Small failures are a “dress rehearsal” for large failures.

You might expect large failures to be the result of large mistakes, but that isn’t how failure works. There’s no single cause, and no proportionality. Large failures are the result of the same systemic issues as small failures. That’s good news, because it means small failures are a “dress rehearsal” for large failures. You can learn just as much from them as you do from big ones.

Therefore, treat every failure as an opportunity to learn and improve. A typo is still a failure. A problem detected before release is still a failure. No matter how big or small, if your team thinks something is “done,” and it later needs correction, it’s worthy of analysis.

But it goes even deeper. Failures are a consequence of your development system, as I said, but so are successes. You can analyze them, too.

Conducting the Analysis

Ally
Retrospectives

Incident analysis is a type of retrospective. It’s a joint look back at your development system for the purpose of learning and improving. Textbook. As such, an effective analysis will involve the five stages of a retrospective: [Derby2006]

  1. Set the stage

  2. Gather data

  3. Generate insights

  4. Decide what to do

  5. Close the retrospective

Include your whole team in the analysis, along with anyone else involved in the incident response. Avoid including managers and other observers; you want participants to be able to speak up and admit mistakes openly, and that requires limiting attendance to just the people who need to be there. When there’s a lot of interest in the analysis, you can produce an incident report, as I’ll describe later.

The time needed for the analysis session depends on the number of events leading up to the incident. A complex outage could have dozens of events and take several hours. A simple defect, though, might have only a handful of events and could take 30–60 minutes. You’ll get faster with experience.

In the beginning, and for sensitive incidents, a neutral facilitator should lead the session. The more sensitive the incident, the more experienced the facilitator needs to be.

This practice, as with all practices in this book, is focused on the team level—incidents that your team can analyze mainly on its own. You can also use it to conduct an analysis of your team’s part in a larger incident.

1. Set the stage
Ally
Safety

Because incident analysis involves a critical look at successes and failures, it’s vital for every participant to feel safe to contribute, including having frank discussions about the choices they made. For that reason, start the session by reminding everyone that the goal is to use the incident to better understand the way you create software—the development system of people, processes, expectations, environment, and tools. You’re not here to focus on the failure itself or to place blame, but instead to learn how to make your development system more resilient.

Ask everyone to confirm that they can abide by that goal and assume good faith on the part of everyone involved in the incident. Norm Kerth’s Prime Directive is a good choice:

Regardless of what we discover, we must understand and truly believe that everyone did the best job he or she could, given what was known at the time, his or her skills and abilities, the resources available, and the situation at hand. [Kerth2001] (ch. 1)

In addition, consider establishing the Vegas rule: What’s said in the analysis session, stays in the analysis session. Don’t record the session, and ask participants to agree to not repeat any personal details shared in the session.

If the session includes people outside the team, or if your team is new to working together, you might also want to establish working agreements for the session. (See the “Create Working Agreements” section.)

2. Gather data

Once the stage has been set, your next step is to understand what happened. You’ll do so by creating an annotated, visual timeline of events.

Stay focused on facts, not interpretations.

People will be tempted to interpret the data at this stage, but it’s important to keep everyone focused on “just the facts.” They’ll probably need multiple reminders as the stage progresses. With the benefit of hindsight, it’s easy to fall into the trap of critiquing people’s actions, but that won’t help. A successful analysis focuses on understanding what people actually did, and how your development system contributed to them doing those things, not what they could have done differently.

To create the timeline, start by creating a long horizontal space on your virtual whiteboard. If you’re conducting the session in person, use blue tape on a large wall. Divide the timeline into columns representing different periods in time. The columns don’t need to be uniform; weeks or months are often best for the earlier part of the timeline, while hours or days might be more appropriate for the moments leading up to the incident.

Have participants use simultaneous brainstorming to think of events relevant to the incident. (See the “Work Simultaneously” section.) Events are factual, nonjudgmental statements about something that happened, such as “Deploy script stops all ServiceGamma instances,” “ServiceBeta returns 418 response code,” “ServiceAlpha doesn’t recognize 418 response code and crashes,” “On-call engineer is paged about system downtime,” and “On-call engineer manually restarts ServiceGamma instances.” (You can use people’s names, but only if they’re present and agree.) Be sure to capture events that went well, too, not just those that went poorly.

Software logs, incident response records, and version control history are all likely to be helpful sources of inspiration. Write each event on a separate sticky note and add it to the board. Use the same color sticky for each event.

Afterward, invite everyone to step back and look at the big picture. Which events are missing? Working simultaneously, look at each event and ask, “What came before this? What came after?” Add each additional event as another sticky note. You might find it helpful to show before/after relationships with arrows.

How was the automation used? Configured? Programmed?

Be sure to include events about people, not just software. People’s decisions are an enormous factor in your development system. Find each event that involves automation your team controls or uses, then add preceding events about how people contributed to that event. How was the automation used? Configured? Programmed? Be sure to keep these events neutral in tone and blame-free. Don’t second-guess what people should have done; only write what they actually did.

For example, the event “Deploy script stops all ServiceGamma instances” might be preceded by “Op misspells --target command-line parameter as --tagret” and “Engineer inadvertently changes deploy script to stop all instances when no --target parameter found,” which in turn is preceded by “Team decides to clean up deploy script’s command-line processing.”

Events can have multiple predecessors feeding into the same event. Each predecessor can occur at different points in the timeline. For example, the event “ServiceAlpha doesn’t recognize 418 response code and crashes” could have three predecessors: “ServiceBeta returns 418 response code“ (immediately before); “Engineer inadvertently disables ServiceAlpha top-level exception handler” (several months earlier); and “Engineer programs ServiceAlpha to throw exception when unexpected response code received” (a year earlier).

As events are added, encourage participants to share recollections of their opinions and emotions at the time. Don’t ask people to excuse their actions; you’re not here to assign blame. Ask them to explain what it was like to be there, in the moment, when the event occurred. This will help your team understand the social and organizational aspects of your development system—not just what choices were made, but why.

Ask participants to add additional stickies, in another color, for those thoughts. For example, if Jarrett says, “I had concerns about code quality, but I felt like I had to rush to meet our deadline,” he could write two sticky notes: “Jarrett has concerns about code quality” and “Jarrett feels he has to rush to meet deadline.” Don’t speculate about the thoughts of people who aren’t present, but you can record things they said at the time, such as “Layla says she has trouble remembering deploy script options.”

Keep these notes focused on what people felt and thought at the time. Your goal is to understand the system as it really was, not to second-guess people.

Finally, ask participants to highlight important events in the timeline—the ones that seem most relevant to the incident. Double-check whether people have captured all their recollections about those events.

3. Generate insights

Now it’s time to turn facts into insights. In this stage, you’ll mine your timeline for clues about your development system. Before you begin, give people some time to study the board. This can be a good point to call for a break.

The events aren’t the cause of failure; they’re a symptom of your system.

Begin by reminding attendees about the nature of failure. Problems are always occurring, but they don’t usually combine in a way that leads to failure. The events in your timeline aren’t the cause of the failure; they’re a symptom of how your development system functions. It’s that deeper system that you want to analyze.

Look at the events you identified as important during the “gather data” activity. Which of them involved people? To continue the example, you would choose the “Op misspells --target command-line parameter as --tagret” and “Engineer inadvertently changes deploy script to stop all instances when no --target parameter found” events, but not “Deploy script stops all ServiceGamma instances,” because that event happened automatically.

Working simultaneously, assign one or more of the following categories2 to each people-involved event. Write each category on a third color of sticky note and add it to the timeline.

2The event categories were inspired by [Woods2010] and [Dekker2014].

  • Knowledge and mental models: Involves information and decisions within the team involved in the event. For example, believing a service maintained by the team will never return a 418 response.

  • Communication and feedback: Involves information and decisions from outside the team involved in the event. For example, believing a third-party service will never return a 418 response.

  • Attention: Involves the ability to focus on relevant information. For example, ignoring an alert because several other alerts are happening at the same time, or misunderstanding the importance of an alert due to fatigue.

  • Fixation and plan continuation: Persisting with an assessment of the situation in the face of new information. For example, during an outage, continuing to troubleshoot a failing router after logs show that traffic successfully transitioned over to the backup router. Also involves continuing with an established plan; for example, releasing on the planned date despite beta testers saying the software isn’t ready.

  • Conflicting goals: Choosing between multiple goals, some of which may be unstated. For example, deciding to prioritize meeting a deadline over improving code quality.

  • Procedural adaptation: Involves situations in which established procedure doesn’t fit the situation. For example, abandoning a checklist after one of the steps reports an error. A special case is the responsibility-authority double bind, which requires people to make a choice between being punished for violating procedure or following a procedure that doesn’t fit the situation.

  • User experience: Involves interactions with computer interfaces. For example, providing the wrong command-line argument to a program.

  • Write-in: You can create your own category if the event doesn’t fit into the ones I’ve provided.

The categories apply to positive events, too. For example, “Engineer programs backend to provide safe default when ServiceOmega times out” is a “knowledge and mental models” event.

After you’ve categorized the events, take a moment to consider the whole picture again, then break into small groups to discuss each event. What does each one say about your development system? Focus on the system, not the people.

For example, the event, “Engineer inadvertently changes deploy script to stop all instances when no --target parameter found,” sounds like it’s a mistake on the part of the engineer. But the timeline reveals that Jarrett, the engineer in question, felt he had to rush to meet a deadline, even though it reduced code quality. That means it was a “conflicting goals” event, and it’s really about how priorities are decided and communicated. As team members discuss the event, they realize they all feel pressure from sales and marketing to prioritize deadlines over code quality.

Incident analysis always looks at the system, not individuals.

On the other hand, let’s say the timeline analysis revealed Jarrett also misunderstood the behavior of the team’s command-line processing library. That would make it a “knowledge and mental models” event, too, but you still wouldn’t put the blame on Jarrett. Incident analysis always looks at the system, not individuals. Individuals are expected to make mistakes. In this case, a closer look at the event reveals that, although the team used test-driven development and pairing for production code, it didn’t apply that standard to its scripts. The team didn’t have any way to prevent mistakes in its scripts, and it was just a matter of time before one slipped through.

After the breakout groups have had a chance to discuss the events—for speed, you might want to divide the events among the groups, rather than having each group discuss every event—come together to discuss what you’ve learned about the system. Write each conclusion on a fourth color of sticky note and put it on the timeline next to the corresponding event. Don’t make suggestions, yet; just focus on what you’ve learned. For example, “No systematic way to prevent programming mistakes in scripts,” “Engineers feel pressured to sacrifice code quality,” and “Deploy script requires long and error-prone command line.”

4. Decide what to do

You’re ready to decide how to improve your development system. You’ll do so by brainstorming ideas, then choosing a few of your best options.

Start by reviewing the overall timeline again. How could you change your system to be more resilient? Consider all possibilities, without worrying about feasibility. Brainstorm simultaneously onto a table or a new area of your virtual whiteboard. You don’t need to match your ideas to specific events or questions. Some will address multiple things at once. Questions to consider include:3

3Thanks to Sarah Horan Van Treese for suggesting most of these questions.

  • How could we prevent this type of failure?

  • How could we detect this type of failure earlier?

  • How could we fail faster?

  • How could we reduce the impact?

  • How could we respond faster?

  • Where did our safety net fail us?

  • What related flaws should we investigate?

To continue the example, your team might brainstorm ideas such as, “stop committing to deadlines,” “update forecast weekly and remove stories that don’t fit deadline,” “apply production coding standards to scripts,” “perform review of existing scripts for additional coding errors,” “simplify deploy script’s command line,” and “perform UX review of command-line options across all of the team’s scripts.” Some of these ideas are better than others, but at this stage, you’re generating ideas, not filtering them.

Once you have a set of options, group them into “control,” “influence,” and “soup” circles, depending on your team’s ability to make them happen, as described in the “Circles and Soup” section. Have a brief discussion about the options’ pros and cons. Then use dot voting, followed by a consent vote (see the “Work Simultaneously” section and the “Seek Consent” section), to decide which options your team will pursue. You can choose more than one.

As you think about what to choose, remember that you shouldn’t fix everything. Sometimes, introducing a change adds more risk or cost than the thing it solves. In addition, although every event is a clue about the behavior of your development system, not every event is bad. For example, one of the example events was, “Engineer programs ServiceAlpha to throw exception when unexpected response code received.” Even though that event directly led to the outage, it made diagnosing the failure faster and easier. Without it, something still would have gone wrong, and it would have taken longer to solve.

5. Close the retrospective

Incident analysis can be intense. Close the retrospective by giving people a chance to take a breath and gently shift back to their regular work. That breath can be metaphorical, or you can literally suggest that people stand up and take a deep breath.

Start by deciding what to keep. A screen shot or photo of the annotated timeline and other artifacts is likely to be useful for future reference. First, invite participants to review the timeline for anything they don’t want shared outside the session. Remove those stickies before taking the picture.

Next, decide who will follow through on your decisions and how. If your team will be producing a report, decide who will participate in writing it.

Finally, wrap up by expressing appreciations to one another for your hard work.4 Explain the exercise and provide an example: “(Name), I appreciate you for (reason).” Sit down and wait. Others will speak up as well. There’s no requirement to speak, but leave plenty of time at the end—a minute or so of silence—because people can take a little while to speak up.

4The “appreciations” activity is based on [Derby2006] (ch. 8).

Some people find the “appreciations” activity uncomfortable. An alternative activity is for each participant to take turns saying a few words about how they feel now the analysis is over. It’s okay to pass.

Afterward, thank everybody for their participation. Remind them of the Vegas rule (don’t share personal details without permission), and end.

Organizational Learning

Organizations will often require a report about the incident analysis’s conclusions. It’s usually called a postmortem, although I prefer the more neutral incident report.

In theory, part of the purpose of the incident report is to allow other teams to use what you’ve learned to improve their own development systems. Unfortunately, people tend to dismiss lessons learned by other teams. This is called distancing through differencing. [Woods2010] (ch. 14) “Those ideas don’t apply to us, because we’re an internally facing team, not externall facing.” Or, “We have microservices, not a monolith.” Or, “We work remotely, not in person.” It’s easy to latch on to superficial differences as a reason to avoid change.

Preventing this distancing is a matter of organizational culture, which puts it out of the scope of this book. Briefly, though, people have the most appetite for learning and change after a major failure. Other than that, I’ve had the most success from making the lessons personal. Show how the lessons affect things your audience cares about.

This is easier in conversation than with a written document. In practice, I suspect—but don’t know for sure!—that the most effective way to get people to read and apply the lessons from an incident report is to tell a compelling, but concise story. Make the stakes clear from the outset. Describe what happened and allow the mystery to unfold. Describe what you learned about your system and explain how it affects other teams, too. Describe the potential stakes for other teams and summarize what they can do to protect themselves.

Incident Accountability

Another reason organizations want incident reports is to “hold people accountable.” This tends to be misguided at best.

That’s not to say teams shouldn’t be accountable for their work. They should be! And by performing an incident analysis and working on improving their development system, including working with the broader organization to make changes, they are showing accountability.

Searching for someone to blame makes big incidents worse.

Searching for a “single, wringable neck,” in the misguided parlance of Scrum, just encourages deflection and finger-pointing. It may lower the number of reported incidents, but that’s just because people hide problems. The big ones get worse.

“As the incident rate decreases, the fatality rate increases,” reports The Field Guide to Understanding ‘Human Error’, speaking about construction and aviation. “[T]his supports the importance...of learning from near misses. Suppressing such learning opportunities, at whatever level, and by whatever means, is not just a bad idea. It is dangerous.” [Dekker2014] (ch. 7)

If your organization understands this dynamic, and genuinely wants the team to show how it’s being accountable, you can share what the incident analysis revealed about your development system. (In other words, the final stickies from the “Generate Insights” activity.) You can also share what you decided to do to improve the resiliency of your development system.

Often, your organization will have an existing report template that you’ll have to conform to. Do your best to avoid presenting a simplistic cause-and-effect view of the situation, and be careful to show how the system, not individuals, allowed problems to turn into failures.

Questions

What if we don’t have time to do a full analysis of every bug and incident?

Incident analysis doesn’t have to be a formal retrospective. You can use the basic structure to explore possibilities informally, with just a few people, or even in the privacy of your own thoughts, in just a few minutes. The core point to remember is that events are symptoms of your underlying development system. They’re clues to teach you how your system works. Start with the facts, discuss how they change your understanding of your development system, and only then think of what to change.

Prerequisites

Ally
Safety

Successful incident analysis depends on psychological safety. Unless participants feel safe to share their perspective on what happened, warts and all, you’ll have trouble achieving a deep understanding of your development system.

The broader organization’s approach to incidents has a large impact on participants’ safety. Even companies that pay lip-service to “blameless postmortems” have trouble moving from a simplistic cause-effect view of the world to a systemic view. They tend to think of “blameless” as “not saying who’s to blame,” but to be truly blameless, they need to understand that no one is to blame. Failures and successes are a consequence of a complex system, not specific individuals’ actions.

You can conduct a successful incident analysis in organizations that don’t understand this, but you’ll need to be extra careful to establish ground rules about psychological safety, and ensure people who have a blame-oriented worldview don’t attend. You’ll also need to exercise care to make sure the incident report, if there is one, is written with a systemic view, not a cause-effect view.

Indicators

When you conduct incident analyses well:

  • Incidents are acknowledged and even incidents with no visible impact are analyzed.

  • Team members see the analysis as an opportunity to learn and improve, and even look forward to it.

  • Your system’s resiliency improves over time, resulting in fewer escaped defects and production outages.

  • No one is blamed, judged, or punished for the incident.

Alternatives and Experiments

Many organizations approach incident analysis through the lens of a standard report template. This tends to result in shallow “quick fixes” rather than a systemic view, because people focus on what they want to report rather than studying the whole incident. The format I’ve described will help people expand their perspective before coming to conclusions. Conducting it as a retrospective will also ensure everybody’s voices are heard, and the whole team buys into the conclusions.

Many of the ideas in this practice are inspired by books from the field of Human Factors and Systems Safety. Those books are concerned with life-and-death decisions, often made under intense time pressure, in fields such as aviation. Software development has different constraints, and some of those transplanted ideas may not apply perfectly.

In particular, the event categories I’ve provided are likely to have room for improvement. I suspect there’s room to split the “knowledge and mental models” category into several categories. Don’t just add categories arbitrarily, though. Check out the further reading section and ground your ideas in the underlying theory first.

The retrospective format I’ve provided has the most room for experimentation. It’s easy to fixate on solutions or simplistic cause-effect thinking during an incident analysis, and the format I’ve provided is designed to avoid this mistake. But it’s just a retrospective. It can be changed. After you’ve conducted several analyses using the format I’ve provided, see what you can improve by experimenting with new activities. For example, can you conduct parts of the “Gather Information” stage asynchronously? Are there better ways to analyze the timeline during the “Generate Insights” stage? Can you provide more structure to “Decide What to Do”?

Finally, incident analysis isn’t limited to analyzing incidents. You can also analyze successes. As long as you’re learning about your development system, you’ll achieve the same benefits. Try conducting an analysis of a time when the team succeeded under pressure. Find the events that could have led to failure, and the events that prevented failure from occurring. Discover what that teaches you about your system’s resiliency, and think about how you can amplify that sort of resiliency in the future.

Further Reading

The Field Guide to Understanding ‘Human Error’ [Dekker2014] is a surprisingly easy read that does a great job of introducing the theory underlying much of this practice.

Behind Human Error [Woods2010] is a much denser read, but it covers more ground than The Field Guide. If you’re looking for more detail, this is your next step.

The previous two books are based on Human Factors and Systems Safety research. The website learningfromincidents.io is dedicated to bringing those ideas to software development. At the time of this writing, it’s fairly thin, but its heart is in the right place. I’m including it in the hopes that it will have more material by the time you read this.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: No Bugs

For many people, “quality” means “testing,” but Agile teams treat quality differently. Quality isn’t something you test for; it’s something you build in. Not just into your code, but into your entire development system: the way your team approaches its work, the way people think about mistakes, and even the way your organization interacts with your team. In this session, Arlo Belshee and Llewellyn Falco join us to discuss how to build quality in.

Arlo Belshee is a 20-year legacy code & DevOps veteran with a passion for zero bugs. A firm believer in mending code, Arlo's current work is his company, Dig Deep Roots, where he teaches technical practices that unwind legacy code safely a codebase at a time.

Llewellyn Falco is an Agile technical coach. He’s known for strong-style pairing, the open source “ApprovalTests” testing tool, and co-authoring the Mob Programming Guidebook. Llewellyn spends most of his time programming in C# and Java and specializes in improving legacy code.

Reading:
📖 Quality (introduction)
📖 No Bugs

🎙 Discussion prompts:

  • The book conveniently describes four ways to prevent errors and build quality in. First is programmer errors. What are your go-to techniques for preventing coding mistakes?

  • Next is design errors. How do you stop your design and architecture from becoming a source of errors?

  • Requirement errors are also common. What do you do to ensure the team builds the right thing?

  • Finally, systemic errors. How do you find your team’s blind spots and prevent them from recurring?

About the Book Club

The Art of Agile Development Book Club takes place Fridays from 8:00 – 8:45am Pacific. Each session uses an excerpt from the new edition of my book, The Art of Agile Development, as a jumping-off point for a wide-ranging discussion about Agile ideas and practices.

Visit the event page for more information, including an archive of past sessions. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: Team Dynamics

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

📖 The full text of this section is available below, courtesy of the Art of Agile Development book club! Join us on Fridays from 8-8:45am Pacific for wide-ranging discussions about Agile. Details here.

Team Dynamics

Audience
Whole Team

by Diana Larsen

We steadily improve our ability to work together.

Your team’s ability to work together forms the bedrock of its ability to develop and deliver software. You need collaboration skills, the ability to share leadership roles, and an understanding of how teams evolve over time. Together, these skills determine your team dynamics.

Team dynamics are the invisible undercurrents that determine your team’s culture. They’re the way people interact and cooperate. Healthy team dynamics lead to a culture of achievement and well-being. Unhealthy team dynamics lead to a culture of disappointment and dysfunction.

Anyone on the team can have a role in influencing these dynamics. Use the ideas in this practice to suggest ways to improve team members' capability to work together.

What Makes a Team?

A team isn’t just a group of people. In their classic book, The Wisdom of Teams, Jon Katzenbach and Douglas Smith describe six characteristics that differentiate teams from other groups:

[A real team] is a small number of people with complementary skills who are committed to a common purpose, performance goals, and approach for which they hold themselves mutually accountable. [Katzenback2015] (ch. 5, emphasis mine)

The Wisdom of Teams

Arlo Belshee suggests another characteristic: a shared history. A group of people gain a sense of themselves as a team by spending time working together.

If you’ve followed the practices in this book, you have all the preconditions necessary to create a great team. Now you need to develop your ability to work together.

Team Development

In 1965, Bruce W. Tuckman created a well-known model of group development. [Tuckman1965] In it, he described four—later, five—stages of group development: Forming, Storming, Norming, Performing, and Adjourning. His model outlines shifts in familiarity and interactions over time.

Don’t interpret the Tuckman model as an inevitable, purely linear progression.

No model is perfect. Don’t interpret the Tuckman model as an inevitable, purely linear progression. Teams can exhibit behaviors from any of the first four stages. Changes in membership, such as gaining members or losing valued teammates, may cause a team to slip into an earlier stage. When experiencing changes in environment, such as a move from colocated to remote work, or vice versa, a team may regress from later stages to earlier ones. Nevertheless, Tuckman’s model offers useful clues. You can use it to perceive patterns of behavior among your teammates and as a basis for discussions about how to best support one another.

Forming: The new kid in class

The team forms and begins working together. Individual team members recognize a sensation not unlike being the new kid in class: they’re not committed to working with others, but they want to feel included—or rather, not excluded—by the rest of the group. Team members are busy gaining the information they need to feel oriented and safe in their new territory.

You’re likely to see responses such as:

  • Excitement, anticipation, and optimism

  • Pride in individual skills

  • Concern about imposter syndrome (fear of being exposed as unqualified)

  • An initial, tentative attachment to the team

  • Suspicion and anxiety about the expected team effort

While forming, the team may produce little, if anything, that concerns its task goals. This is normal. The good news is, with support, most teams can move through this phase relatively quickly. Teams in the Forming stage may benefit from the wisdom gained from a senior team member's prior team experiences, from a team member who gravitates toward group cohesion activities, or from coaching in team collaboration.

Allies
Purpose
Context
Alignment

Support your teammates with leadership and clear direction. (More on team leadership roles later.) Start out by looking for ways for team members to become acquainted with the work and one another. Establish a shared sense of the team’s combined strengths and personalities. Purpose, context, and alignment chartering are excellent ways to do so. You may benefit from other exercises to get to know one another, such as the “A Connection-Building Exercise” sidebar.

Along with chartering, take time to discuss and develop your team's plan. Focus on the “do-able;” getting things done will build a sense of early success. (The “Your First Week” section describes how to get started.) Find and communicate resources available to the team, such as information, training, and support.

Acknowledge feelings of newness, ambivalence, confusion, or annoyance. They are natural at this stage. Although the chartering sessions should have helped make team responsibilities clear, clarify any remaining questions about work expectations, boundaries of authority and responsibility, and working agreements. Make sure people know how their team fits with other teams working on the same product. For in-person teams, explain what nearby teams are working on, even if it isn’t related to the team’s work.

During the Forming stage, team members need the following skills:

  • Peer-to-peer communication and feedback

  • Group problem solving

  • Interpersonal conflict management

Ensure the team has coaching, mentoring, or training in these skills as needed.

Storming: Group adolescence

The team begins its shift from a collection of individuals to a team. Though team members aren’t yet fully effective, they have the beginnings of mutual understanding.

During the Storming stage, the team deals with disagreeable issues. It's a time of turbulence, collaboratively choosing direction, and making decisions together. That's why Tuckman et al. called it “Storming.” Team members have achieved a degree of comfort—enough to begin challenging one another's ideas. They understand one another well enough to know where areas of disagreement surface, and they willingly air differences of opinion. This dynamic can lead to creative tension or destructive conflicts, depending on how it’s handled.

Expect the following behaviors:

  • Reluctance to get on with tasks, or many differing opinions about how to do so.

  • Wariness about continuous improvement approaches.

  • Sharp fluctuations in attitude about the team and its chances of success.

  • Frustration with lack of progress or other team members.

  • Arguments between team members, even when they agree on the underlying issue.

  • Questioning the wisdom of the people who selected the team structure.

  • Suspicion about the motives of the people who appointed other members to the team. (These suspicions may be specific or generalized, and are often based more on past experience than the current situation.)

Support your Storming team by keeping an eye out for disruptive actions, such as defensiveness, competition between team members, factions or choosing sides, and jealousy. Expect increased tension and stress.

Ally
Safety

As you see these behaviors, be ready to intervene by describing the patterns you see. For example, “I notice that there’s been a lot of conflict around design approaches, and people are starting to form sides. Is there a way to bring it back to a more collegial discussion?” Maintain transparency, candor, and feedback, and surface typical conflict issues. Openly discuss the role of conflict and pressure in creative problem solving, including the connection between psychological safety and healthy conflict. Celebrate small team achievements.

When you notice an accumulation of storming behaviors on the team, typically a few weeks after the team first forms, pull the team together for a discussion of trust:

  1. Think back on all your experiences as part of any kind of team. When did you have the most trust in your teammates? Tell us a short story about that time. What conditions allowed trust to build?

  2. Reflect on the times and situations in your life when you have been trustworthy. What do you notice about yourself that you value? How have you built trust with others?

  3. In your opinion, what is the core factor that creates and sustains trust in organizations? What is the core factor that creates, nurtures, and sustains trust among team members?

  4. What three wishes would you make to heighten trust and healthy communication in this team?

This is a difficult stage, but it will help team members gain wisdom and lay the groundwork for the next stage. Watch for a sense of growing group cohesion. As cohesion grows, ensure that each member continues to express their diverse opinions, rather than shutting them down in favor of false harmony. (See the “Don’t Shy Away From Conflict” section.)

Norming: We’re #1

Team members have bonded together as a cohesive group. They’ve found a comfortable working cadence and enjoy their collaboration. They identify as part of the team. In fact, they may identify so closely, and enjoy working together so much that symbols of belonging appear in the workspace. You might notice matching or very similar t-shirts, coffee cups with the team name, or coordinated laptop stickers. Remote teams might have “wear a hat” or “Hawaiian shirt” days.

Norming teams have created agreement on structure and working relationships. Informal, implicit behavior norms that supplement the team’s working agreements develop through their collaboration. People outside the team may notice and comment on the team’s “teamliness.” Some may envy it—particularly if team members begin to flaunt their successes or declare their team “the best.”

Their pride is warranted. Teams in the Norming stage make significant, regular progress toward their goals. Team members face risks together and work well together. You’ll see the following behaviors:

  • A new ability to express criticism constructively

  • Acceptance and appreciation of differences among team members

  • Relief that this just might all work out well

  • More friendliness

  • More sharing of personal stories and confidences

  • Open discussions of team dynamics

  • Desire to review and update working agreements and boundary issues with other teams

How do you encourage your Norming team? Look outside your team boundaries and broaden team members' focus. Facilitate contact with customers and suppliers. (Field trips!) If the team's work relates to the work of other teams, ask to train in cross-team groups.

Build your team's cohesiveness and open your horizons, as well. Look for opportunities for team members to share experiences, such as volunteering together or presenting to other parts of the organization. Make sure these opportunities are suitable for all team members, so your good intentions don’t create in- and out-groups.

The skills needed by Norming teams include:

  • Feedback and listening

  • Group decision-making processes

  • Understanding the organizational perspective on the team's work

Books such as What Did You Say? The Art of Giving and Receiving Feedback [Seashore2013] and Facilitators’ Guide to Participatory Decision-Making [Kaner1998] will help the team learn the first two skills, and including the whole team in discussions with organizational leaders will help with the third.

Watch out for attempts to preserve harmony by avoiding conflicts.

Watch out for attempts to preserve harmony by avoiding conflicts. In their reluctance to return to Storming, team members may display groupthink: a form of false harmony where team members avoid disagreeing with each other, even when it’s justified. Groupthink: Psychological Studies of Policy Decisions and Fiascoes [Janis1982] is a classic book that explores this phenomenon.

Ally
Safety

Discuss team decision-making approaches when you see the symptoms of groupthink. One sign is team members holding back on critical remarks to keep the peace, especially if they bring up their critiques later, after it’s too late to change course. Ask for critiques, and make sure team members feel safe to disagree.

One way to avoid groupthink is to start discussions by defining the desired outcome. Work toward an outcome rather than away from a problem. Experiment with the following ground rules for team decisions:

  • Agree that each team member will act as a critical evaluator.

  • Promote open inquiry rather than stating positions.

  • Adopt a decision process that includes identifying at least three viable options before a choice is made.

  • Appoint a “contrarian” to search for counterexamples.

  • Split the team into small groups for independent discussion.

  • Schedule a “second chance” meeting to review the decision.

Performing: Team synergy

The team’s focus has shifted to getting the job done. Performance and productivity are the order of the day. Team members connect with their part in the mission of the larger organization. They follow familiar, established procedures for making decisions, solving problems, and maintaining a collaborative work climate. Now the team is getting a lot of work done.

Performing teams transcend expectations. They exhibit greater autonomy, reach higher achievements, and have developed the ability to make rapid, high-quality decisions. Team members achieve more together than anyone would have expected from the sum of their individual effort. Team members continue to show loyalty and commitment to one another, while expressing less emotion about interactions and tasks than in earlier stages.

You’ll see these behaviors:

  • Significant insights into personal and team processes.

  • Little need for facilitative coaching. Such coaches will spend more time on liaising and mediating with the broader organization than on internal team needs.

  • Collaboration that’s understanding of team members’ strengths and limits.

  • Remarks such as, “I look forward to working with this team,” “I can’t wait to come to work,” “This is my best job ever,” and “How can we reach even greater success?”

  • Confidence in one another, and trust that each team member will do their part toward accomplishing team goals.

  • Preventing, or working through, problems and destructive conflicts.

Individuals who have worked on Performing teams always remember their experience. They have stories about feeling closely attached to their teammates. If the team spends much time in Performing, team members may be very emotional about potential team termination or rearrangement.

Although Performing teams are at the pinnacle of team development, they still need to learn to work well with people outside the team. They’re not immune to reverting to earlier stages, either. Changes in team membership can disrupt their equilibrium, as can significant organizational change and disruptions to their established work habits. And there are always opportunities for further improvement. Keep learning, growing, and improving.

Adjourning: Separating and moving on

The team inevitably separates. It achieves its final purpose, or team members decide it’s time to move on.

Effective, highly productive teams acknowledge this stage. They recognize the benefit of farewell “ceremonies” that celebrate the team’s time together and help team members move on to their next challenge.

Communication, Collaboration, and Interaction

Team members’ communication, interaction, and collaboration create group cohesion. These exchanges influence the team’s ability to work effectively—or not.

Consider my Team Communication Model, shown in the “Larsen’s Team Communication Model” figure, which shows how effective team communication requires developing an interconnected, interdependent series of communication skills. It starts with developing just enough trust to get started. Each new skill pulls the team upward, while strengthening the supporting skills that follow.

A cutaway diagram of a hill with a tree on it. The hill shows four layers of strata. The bottom-most layer is labelled “Trust.” Above that is “commitment,” then “conflict,” then “creativity.” Above the hill, birds fly in the sky near the tree. This layer is labelled “high performance.” The diagram is marked “copyright 1993-2014 Diana Larsen, FutureWorks Consulting LLC. Permission to use with copyright information attached.”

Figure 1. Larsen’s Team Communication Model

Start with a strong base of trust
Allies
Alignment
Safety

As you form your team, concentrate on helping team members find trust in one another. It doesn’t need to be a deep trust; just enough to agree to work together and commit to the work. Alignment chartering and an emphasis on psychological safety both help.

Support your growing trust with three-fold commitment
Ally
Purpose

From a foundation of trust, your team will begin exploring the three-fold nature of team commitment:

  • Commitment to the team’s purpose

  • Commitment to each other’s well-being

  • Commitment to the well-being of the team as a whole

Chartering purpose and alignment will help build commitment. As commitment solidifies, trust will continue to grow. People’s sense of psychological safety will grow along with it.

Once commitment and trust start improving psychological safety, it’s a good time to examine the power dynamics of the team. No matter how egalitarian your team may be, power dynamics always exist. They’re part of being human. Left unaddressed or hidden, power dynamics turn destructive. It’s best to keep them out in the open, so the team can attempt to level the field.

Power dynamics come from individual perceptions of each other’s influence, ability to make things happen, and preferential treatment. Bring them into the open by holding a discussion of the power dynamics that exist in the team, and how they affect collaboration. Discuss how the team’s collective and diverse powers can be used to help the whole team.

Right-size conflicts with feedback

The more team members recognize one another's commitment, the more their approach to conflict adapts. Rather than “you against me,” they start approaching conflicts as “us against the problem.” Focus on developing team members’ ability to give and receive feedback, as described in the “Learn How to Give and Receive Feedback” section. Approach feedback with the following goals:

  • The feedback we give and get is constructive and helpful.

  • Our feedback is caring and respectful.

  • Feedback is an integral part of our work.

  • No one is suprised by feedback; we wait for explicit agreement before giving feedback.

  • We offer feedback to encourage behavior as well as to discourage or change behavior.

Peer-to-peer feedback helps to deal with interpersonal conflicts while they’re small. Unaddressed, molehill resentments have the potential to grow into mountains of mistrust. The skills team members develop for feedback within the team will help them in larger conflicts with forces outside the team.

Spark creativity and innovation

What is team innovation, but the clash of ideas that sparks new potential? Retaining healthy working relationships while the sparks fly is a team skill. It rises from the ability to engage and redirect conflicts toward desired outcomes. It stimulates greater innovation and creativity. Team problem-solving capability soars.

Ally
Retrospectives

Develop team creativity by offering learning challenges and playful approaches. Build it into the team’s routine. Use slack to explore new technologies, as described in the “Dedicate Time to Exploration and Experimentation” section. Use retrospectives to experiment with new ideas. Make space for whimsy and inventive irrelevance. (Teach each other to juggle!)

Sustain high performance

When collaboration and communication skills join with task-focused skills, high performance becomes routine. The challenge lies in sustaining high performance. Avoid complacency. As a team, continue to refine your skills in building trust, committing to the work and one another, providing feedback, and sparking creativity. Look for opportunities to build resilience and further improve.

Shared Leadership

Mary Parker Follett, a management expert also known as “the mother of modern management,” was a pioneer in the fields of organizational theory and behavior. In discussing the role of leadership, she wrote:

It seems to me that whereas power usually means power-over, the power of some person or group over some other person or group, it is possible to develop the conception of power-with, a jointly developed power, a co-active, not a coercive power...Leader and followers are both following the invisible leader—the common purpose. [Graham1995] (pp. 103, 172)

Mary Parker Follett

Ally
Whole Team

Effective Agile teams develop “power with” among all team members. They share leadership. (See the “Key Idea: Self-Organizing Teams” sidebar.) By doing so, they make the most of their collaboration and the skills of the whole team.

Mary Parker Follett described “the law of the situation,” in which she argued for following the lead of the person with the most knowledge of the situation at hand. This is exactly how Agile teams are meant to function. It means every team member has the potential to step into a leadership role. Everyone leads, at times, and follows a peer leader at others.

Team members can play a variety of leadership roles, as summarized in the “Leadership Roles” table.1 People can play multiple roles, including switching at will, and multiple people can fill the same role. The important thing is coverage. Teams need all these kinds of leadership from their team members.

1With the exception of “Diplomats,” these roles were developed by Diana Larsen and Esther Derby, based on [Benne1948].

Table 1. Leadership Roles

Task-OrientedCollaboration-Oriented
DirectionPioneer, InstructorDiplomat, Influencer, Follower
GuidanceCommentator, CoordinatorPromoter, Peacemaker
EvaluationCritic, Gatekeeper, ContrarianReviewer, Monitor
  • Pioneers (task-oriented direction) ask questions and seek data. They scout what’s coming next, looking for new approaches and bringing fresh ideas to the team.

  • Instructors (task-oriented direction) answer questions, supply data, and coach others in task-related skills. They connect the team to relevant sources of information.

  • Diplomats (collaboration-oriented direction) connect the team with people and groups outside the team, act as a liaison, and represent the team in outside meetings.

  • Influencers (collaboration-oriented direction) encourage the team in chartering, initiating working agreements, and other activities that build awareness of team culture.

  • Followers (collaboration-oriented direction) provide support and encouragement. They step back, allowing others to take the lead in their areas of strength, or where they’re developing strength. They conform to team working agreements.

  • Commentators (task-oriented guidance) explain and analyze data. They put information into context.

  • Coordinators (task-oriented guidance) pull threads of work together in a way that make sense. They link and integrate data and align team activities onto their tasks.

  • Promoters (collaboration-oriented guidance) focus on equitable team member participation. They ensure every team member has the chance to participate and help. They encourage quieter team members to contribute their perspectives on issues that affect the team.

  • Peacemakers (collaboration-oriented guidance) work for common ground. They seek harmony, consensus, and compromise when needed. They may mediate disputes that team members have difficulty solving on their own.

  • Critics (task-oriented evaluation) evaluate and analyze relevant data, looking for risks and weaknesses in the team’s approach.

  • Gatekeepers (task-oriented evaluation) encourage work discipline and maintain working agreements, as well as managing team boundaries to keep interference at bay.

  • Contrarians (task-oriented evaluation) protect the team from groupthink by deliberately seeking alternative views and opposing habitual thinking. They also vet the team’s decisions against the team’s values and principles.

  • Reviewers (collaboration-oriented evaluation) ensure the team is meeting acceptance criteria and responding to customer needs.

  • Monitors (collaboration-oriented evaluation) attend to how the whole team is working together. (Are team members working well, or not?) They protect the team’s psychological safety and foster healthy working relationships among team members.

“Follower” is a particularly powerful role for people who are expected to lead.

Although it may seem strange to include “follower” as a leadership role, actively following other people’s lead helps the team learn to share leadership responsibilities. It’s a particularly powerful role for people who are expected to lead, such as senior team members.

Teams that share leadership across these roles can be called leaderful teams. To develop a leaderful team, discuss these leadership roles together. A good time to do so is when you notice uneven team participation or over-reliance on a single person to make decisions. Share the list of roles and ask the following questions:

  • How many of the leadership roles does each team member naturally enact?

  • Is anyone overloaded with leadership roles? Or filling a role they don’t want?

  • Which of these roles need multiple people to fill? (For example, the Contrarian role is best rotated among several team members.)

  • Which of these roles are missing on our team? What's the impact of missing someone to fill these roles?

  • How might we fill the missing roles? Who wants practice in this aspect of leadership?

  • What else do we notice about these roles?

Focus team members on choosing how they will ensure their effective collaboration by covering the leadership roles. Be open to creating new working agreements in response to this conversation.

Some team members may be natural Contrarians, but if they always play that role, the rest of the team may fall into the trap of discounting their comments. “Oh, never mind. Li always sees the bleakest, most pessimistic side of things!” For the Contrarian role in particular, ensure that it’s shared among various team members, so it remains effective.

Toxic Behavior

Toxic behavior is any behavior that produces an unsafe environment, degrades team dynamics, or damages the team’s ability to achieve its purpose.

If a team member is exhibiting toxic behaviors, start by remembering the Retrospective Prime Directive: “Regardless of what we discover, we must understand and truly believe that everyone did the best job he or she could, given what was known at the time, his or her skills and abilities, the resources available, and the situation at hand.” [Kerth2001] (ch. 1) Assume the person is doing the best job they can.

Look for environmental pressures first. For example, a team member may have a new baby and not be getting enough sleep. Or a new team member may be solely responsible for a vital subsystem they don't yet know well. Together, the team can make adjustments that help people improve their behavior. For example, agreeing to move the morning stand-up so the new parent can come in later, or sharing responsibility for the vital subsystem.

The next step is giving feedback to the person in question. Use the process described in the “Learn How to Give and Receive Feedback” section to describe the impact of their behavior and request a change. Very often, that’s enough. They didn’t realize how their behavior affected the team and they do better.

Be careful not to misidentify Contrarians as toxic.

Sometimes, teams can label colleagues as toxic when they aren’t actually doing anything wrong. This can easily happen to people who regularly take the Contrarian leadership role. They don’t go along with the rest of the team’s ideas, or they perceive a risk or obstacle that others miss and won’t let it go. Be careful not to misidentify Contrarians as toxic. Teams need Contrarians to avoid groupthink. However, it may be worth having a discussion about rotating the role.

Ally
Safety

If a person really is showing toxic behavior, they may ignore the team’s feedback, or refuse to adjust to the team’s psychological safety needs. If that happens, they are no longer a good match for the team. Sometimes, it’s just a personality clash, and they’ll do well on another team.

At this point, it’s time to bring in your manager, or whoever assigns team membership. Explain the situation. Good managers understand that every team member’s performance depends on every other team member. An effective leader will step in to help the team. For them to do so, the team members need to inform them of what they need, as well as the steps they’ve already taken to encourage changes in behavior.

Some managers may resist removing a person from the team, especially if they identify the team member as a “star performer.” They could suggest the team should accommodate the behavior instead. Unfortunately, this tends to damage the team’s performance as a whole. Ironically, it can make the “star performer” seem like even more of a star, as they push the people around them down.

In this situation, you can only decide for yourself whether the benefits from being part of the team are worth the toxic behavior you experience. If they’re not, your best option is to move to another team or organization.

Questions

Isn’t it important that a team have one leader—a “single, wringable neck”? How does that work with leaderful teams?

A “single, wringable neck” is a satisfying way to simplify a complex problem, but it’s not so satisfying for the person whose neck is being wrung. It’s also contrary to the Agile ideal of collective ownership (see the “Key Idea: Collective Ownership” sidebar). The team as a whole is responsible. There’s no scapegoat to take the fall when things go wrong, or reap the rewards when things go well, because success and failure are the result of a complex interaction between multiple participants and factors. Every team member’s contribution is vital.

This isn’t just abstract philosophy. Leaderful teams do better work, and develop into high-performing teams more quickly. Sharing leadership builds stronger teams.

What if I don't have the skills to help improve our team dynamics?

If you’re not comfortable working on teamwork skills, that’s okay. You can still help. Watch for the folks who adopt the collaboration-oriented leadership roles. Make sure you support their efforts. If your team doesn't have members willing to assume those roles, talk with your manager or sponsor about providing a coach or other team member skilled in team dynamics. (See the “Coaching Skills” section.)

Prerequisites

Allies
Energized Work
Whole Team
Team Room
Management

For these ideas to become reality, both your team and organization need to be on board. Team members need to be energized and motivated to do good work together. It won’t work if people are just interested in punching a clock and being told what to do. Similarly, your organization needs to invest in teamwork. This includes creating a whole team, a team room, and an Agile-friendly approach to management.

Indicators

When your team has healthy team dynamics:

  • Team members enjoy coming to work.

  • Team members say they can rely on their teammates to follow through on their commitments, or communicate when they can’t.

  • Team members trust that everyone on the team is committed to achieving the team’s purpose.

  • Team members know one another's strengths and support one another's limits.

  • Team members work well together and celebrate progress and successes.

Alternatives and Experiments

The material in this practice represents only a tiny portion of the valuable knowledge available about teams, team dynamics, managing conflicts, leadership, and many more topics that affect team effectiveness. The references throughout this practice and in the “Further Reading” section have a wealth of information. But even that only begins to scratch the surface. Ask a mentor for their favorites. Keep learning and experimenting. It’s a lifelong journey.

Further Reading

Keith Sawyer has spent his career exploring creativity, innovation, and improvisation, and their roots in effective collaborative effort. In Group Genius: The Creative Power of Collaboration [Sawyer2017], he offers insightful anecdotes and ideas.

Roger Nierenberg’s memoir and instruction guide for leaders, Maestro: A Surprising Story about Leading by Listening [Nierenberg2009], contributes “out of the box” ways of thinking about leadership. He also has a website with videos that demonstrate his techniques at http://www.musicparadigm.com/videos/.

The Wisdom of Teams: Creating the High-Performance Organization [Katzenback2015] is the classic, foundational book about high-performing teams, their characteristics, and the environments that help them flourish.

Shared Leadership: Reframing the Hows and Whys of Leadership [Pearce2002] is a compilation of the best ideas about leaderful teams and organizations. It can be a challenging read, but it’s well worth exploring to expand your ideas about who, and what, is a leader.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: Retrospectives

Feedback and adaptation are central to Agile, and that applies to the team’s approach to Agile itself. Although you might start with an off-the-shelf Agile method, every team is expected to customize its method for itself. In this session, Aino Corry joins us to look at how retrospectives help teams reflect and improve.

Aino Corry is a teacher, a technical conference editor and retrospectives facilitator. She holds a masters degree and a Ph.D. in computer science. She has 12 years of experience with Patterns in Software Development, and 20 years of experience with agile processes in academia and industry. She also teaches how to teach Computer Science to teachers, and thus lives up to the name of her company; Metadeveloper. In her spare time, she runs and sings (but not at the same time). Aino is the author of the book Retrospective Antipatterns.

Reading:
📖 Improvement (introduction)
📖 Retrospectives
📖 Impediment Removal

🎙 Discussion prompts:

  • Retrospectives are a powerful tool, but only if used correctly. What are some ways you’ve seen them go wrong?

  • Retrospectives can get boring after a while. What can be done to keep them interesting?

  • Teams often struggle with following through on retrospective ideas. How can teams do a better job at closing the retrospective feedback loop?

  • Sometimes, the team’s biggest impediments are out of their direct control. What should teams do about impediments that are outside their direct control?

About the Book Club

The Art of Agile Development Book Club takes place Fridays from 8:00 – 8:45am Pacific. Each session uses an excerpt from the new edition of my book, The Art of Agile Development, as a jumping-off point for a wide-ranging discussion about Agile ideas and practices.

Visit the event page for more information, including an archive of past sessions. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: Blind Spot Discovery

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

📖 The full text of this section is available below, courtesy of the Art of Agile Development book club! Join us on Fridays from 8-8:45am Pacific for wide-ranging discussions about Agile. Details here.

Blind Spot Discovery

Audience
Testers, Whole Team

We discover the gaps in our thinking.

Fluent Delivering teams are very good at building quality into their code, as you saw in the previous practice. But nobody’s perfect, and teams have blind spots. Blind spot discovery is a way of finding those gaps.

To find blind spots, look at the assumptions your team makes, and consider the pressures and constraints team members are under. Imagine what risks the team might be facing and what team members might falsely believe to be true. Make a hypothesis about the blind spots that could occur as a result and investigate to see if your guess is right. Testers tend to be particularly good at this.

When you find a blind spot, don’t just fix the problem you found. Fix the gap. Think about how your approach to development allowed the bug to occur, then change your approach to prevent that category of bugs from happening again, as described in the “Prevent Systemic Errors” section.

Validated Learning

When people think about bugs, they often think about logic errors, UI errors, or production outages. But the blind spot I see most often is more fundamental, and more subtle.

More than anything else, teams build the wrong thing. To use Lean Startup terminology, they lack product-market fit. I think this happens because so many teams think of their job as building the product they were told to build. They act as obedient order-takers: a software factory designed to ingest stories in one end and plop software out the other.

Nobody really knows what you should build, not even the people asking for it.

Don’t just assume that your team should build what it’s told to build. Instead, assume the opposite: nobody really knows what you should build, not even the people asking for it. Your team’s job is to take those ideas, test them, and learn what you should really build. To paraphrase The Lean Startup [Ries2011], the fundamental activity of an Agile team is to turn ideas into products, observe how customers and users respond, and then decide whether to pivot or persevere. This is called validated learning.

Allies
Purpose
Visual Planning
Real Customer Involvement

For many teams, the first time they test their ideas is when they release their software. That’s pretty risky. Instead, use Ries’s Build-Measure-Learn loop:

  1. Build. Look at your team’s purpose and plan. What core assumptions are you making about your product, customers, and users? Choose one to test, then think, “What’s the smallest thing we can put in front of real customers and users?” It doesn’t have to be a real product—in some cases, a mock-up or paper prototype will work—and you don’t have to involve every user, but you do need to involve people who will actually buy or use your product.

  2. Measure. Prior to showing people what you’ve built, decide what data you need to see in order to say that the assumption has been proven or disproven. The data can be subjective, but the measurement should be objective. For example, “70% of our customers say they like us” is an objective measurement of subjective data.

  3. Learn. Your measurement will either validate your hypothesis or disprove it. If you validated the hypothesis, continue with the next one. If you disproved your hypothesis, change your plans accordingly.

For example, one team’s purpose was to improve surgical spine care outcomes. The team planned to do so by building a tool to give clinical leads a variety of views into surgical data. One of the team's core assumptions was that clinical leads would trust the underlying data presented by the tool. But the data could be poor, and leads tended to be skeptical.

To test the assumption, the team decided to: (build) use real data from seven clinics to create a mock-up of the tool; (measure) show it to those seven clinics’ leads; (learn) if at least five said the data was of acceptable quality, the assumption would be validated. If not, the team would come up with a new plan.

Validated learning is one of the hallmarks of an Optimizing team.

Validated learning is one of the hallmarks of an Optimizing team. Depending on your organizational structure, you may not be able to use it to its fullest. Still, the fundamental idea applies. Don’t just assume delivering stories will make people happy. Do everything you can to check your assumptions and get feedback.

For more about validated learning and the related concept of customer discovery, see [Ries2011] and [Blank2020b].

Exploratory Testing

Ally
Test-Driven Development

Test-driven development ensures that programmers’ code does what they intended it to do, but what if the programmer’s intention is wrong? For example, a programmer might think the correct way to determine the length of a string in JavaScript is to use string.length, but that can result in counting six letters in the word “naïve.”1

1The count can be off because string.length reports the number of codepoints (sort of), not the number of graphemes—what people usually think of as characters—and it’s possible for Unicode to store the grapheme “ï” as two codepoints: a normal “i” plus a “combining diaeresis” (the umlaut). String manipulation has similar issues. Reversing a string containing the Spanish flag will convert Spain 🇪🇸 to Sweden 🇸🇪, which is sure to surprise beach-goers.

Exploratory testing is a technique for finding these blind spots. It’s a rigorous approach to testing that involves “designing and executing tiny experiments in rapid succession using the results from the last experiment to inform the next.” [Hendrickson2013] (ch. 1) It involves these steps:

  1. Charter. Start by deciding what you’re going to explore, and why. A new technology the team recently adopted? A recently released user interface? A critical piece of security infrastructure? Your charter should be general enough to give you an hour or two of work, and specific enough to help you focus.

  2. Observe. Use the software. You’ll often do so via the UI, but you can also use tools to explore APIs and network traffic, and you can also observe hidden parts of the system, such as logs and databases. Look for two things: anything that’s out of the ordinary, and anything you can modify, such as a URL, form field, or file upload, that might lead to unexpected behavior. Take notes as you go, so you can retrace your steps when necessary.

  3. Vary. Don’t just use the software normally; push its boundaries. Put an emoji in a text field. Enter a size as zero or negative. Upload a zero-byte file, a corrupted file, or an “exploding” ZIP file that expands to terabytes of data. Edit URLs. Modify network traffic. Artificially slow down your network, or write to a filesystem with no free space.

As you go, use your observations and your understanding of the system to decide what to explore next. You’re welcome to supplement those insights by looking at code and production logs. If you’re exploring security capabilities, you can use your team’s threat model as a source of inspiration, or create your own. (See the “Threat Modeling” section.)

There’s much more to exploratory testing than I have room for in this book. For more detail, and a great set of heuristics about what to vary, see [Hendrickson2013].

Chaos Engineering

In a large networked system, failures are an everyday occurrence. Your code must be programmed to be resilient to those failures, and that requires careful attention to error handling and resilience. Unfortunately, error handling is a common blind spot for less experienced programmers and teams, and even experienced teams can’t predict every failure mode of a complex system.

Chaos engineering can be considered a specialized form of exploratory testing that focuses on system architecture.2 It involves deliberately injecting failures into running systems—often, live production systems—to learn how they respond to failure. Although this may seem risky, it can be done in a controlled way. It allows you to identify issues that appear only as a result of complex interactions.

2Some people in the chaos engineering community object to use of the word “testing” in relationship to chaos engineering. They prefer the term “experiment.” I think that objection misunderstands the nature of testing. As Elisabeth Hendrickson writes in Explore It!: “This is the essence of testing: designing an experiment to gather empirical evidence to answer a question about a risk.” [Hendrickson2013] (ch. 1) That’s exactly what chaos engineering is, too.

Chaos engineering is similar to exploratory testing in that it involves finding opportunities to vary normal behavior. Rather than thinking in terms of unexpected user input and API calls, though, you think in terms of unexpected system behavior: nodes crashing, high latency network links, unusual responses, and so forth. Fundamentally, it’s about conducting experiments to determine if your software system is as resilient as you think it is.

  1. Start with an understanding of your system’s “steady state.” What does your system look like when it’s functioning normally? What assumptions does your team or organization make about your system’s resiliency? Which of those would be most valuable to check first? When you perform the experiment, how will you know if it succeeded or failed?

  2. Prepare to vary the system in some way: remove a node, introduce latency, change network traffic, artificially increase demand, etc. (If this is your first test, start small, so the impact of failure is limited.) Form a hypothesis about what will happen. Make a plan for aborting the experiment if things go badly wrong.

  3. Make the change and observe what happens. Was your hypothesis correct? Is the system still performing adequately? If not, you’ve identified a blind spot. Either way, discuss the results with your team and improve your collective mental model of the system. Use what you’ve learned to decide which experiment you should conduct next.

Many of the stories surrounding chaos engineering involve automated tools, such as Netflix’s Chaos Monkey. To use chaos engineering within your team, though, don’t focus on building tools. It’s more valuable to conduct a breadth of experiments than to automatically repeat a single experiment. You’ll need some basic tooling to support your work, and that tooling will grow in sophistication over time, but try to conduct the broadest set of experiments you can for the least amount of work.

The principles of chaos engineering can be found at https://principlesofchaos.org. For a book-length treatment of the topic, see [Rosenthal2020].

Penetration Testing and Vulnerability Assessments

Although exploratory testing can find some security-related blind spots, security-sensitive software warrants testing by experts.

Penetration testing, also known as pentesting, involves having people attempt to defeat the security of your system in the way a real attacker would. It can involve probing the software your team writes, but it also considers security more holistically. Depending on the rules of engagement you establish, it can involve probing your production infrastructure, your deployment pipeline, human judgment, and even physical security such as locks and doors.

Penetration testing requires specialized expertise. You’ll typically need to a hire an outside firm. It’s expensive, and your results depend heavily on the skill of the testers. Exercise extra diligence when hiring a penetration testing firm, and remember that the individuals performing the test are at least as important as the firm you choose.

Vulnerability assessments are a less costly alternative to penetration testing. Although penetration testing is technically a type of vulnerability assessment, most firms advertising “vulnerability assessments” perform an automated scan.

Some vulnerability assessments perform static analysis of your code and dependencies. If they’re fast enough, they can be included in your continuous integration build. (If not, you can use multistage integration, as described in the “Multistage Integration Builds” section.) Over time, the assessment vendor will add additional scans to the tool, which will alert your team to new potential vulnerabilities.

Other assessments probe your live systems. For example, a vendor might probe your servers for exposed administration interfaces, default passwords, and vulnerable URLs. You’ll typically receive a periodic report (such as once a month) describing what the assessment found.

Vulnerability assessments can be noisy. You’ll typically need someone with security skills to go through them and triage their findings, and you may need some way of safely ignoring irrelevant findings. For example, one assessment scanned for vulnerable URLs, but it wasn’t smart enough to follow HTTP redirects. Every month, it reported every URL in its scan as a vulnerability, even though the server was just performing a blanket redirect.

In general, start by using threat modeling (see the “Threat Modeling” section) and security checklists, such as the OWASP Top 10, to inform your programming and exploratory testing efforts. Use automated vulnerability assessments to address additional threats and find blind spots. Then turn to penetration testing to learn what you missed.

Questions

Should these techniques be performed individually, in pairs, or as a mob?

Allies
Pair Programming
Mob Programming

It’s up to your team. It’s fine to perform these techniques individually. On the other hand, pairing and mobbing are good for coming up with ideas and disseminating insights, and they can help break down the barriers that tend to form between testers and other team members. Experiment to see which approach work best for your team. It might vary by technique.

Won’t the burden of blind spot discovery keep getting bigger as the software gets bigger?

It shouldn’t. Blind spot discovery isn’t like traditional testing, which tends to grow along with the codebase. It’s for checking assumptions, not validating an ever-increasing codebase. As the team addresses blind spots and gains confidence in its ability to deliver high-quality results, the need for blind spot discovery should go down, not up.

Prerequisites

Any team can use these techniques. But remember that they’re for discovering blind spots, not checking that the software works. Don’t let them be a bottleneck. You don’t need to check before you release, and you don’t need to check everything. You’re looking for flaws in your development system, not your software system.

Ally
No Bugs

On the other hand, releasing without additional checks requires your team to be able to produce code with nearly no bugs. If you aren’t there yet, or if you just aren’t ready to let go, it’s okay to delay releasing until you’ve checked for blind spots. Just be sure not to use blind spot discovery as a crutch. Fix your development system so you can release without manual testing.

Indicators

When you use blind spot discovery well:

  • The team trusts the quality of its software.

  • The team doesn’t use blind spot discovery as a form of pre-release testing.

  • The number of defects found in production and by blind-spot techniques declines over time.

  • The amount of time needed for blind spot discovery declines over time.

Alternatives and Experiments

Allies
No Bugs
Test-Driven Development

This practice is based on an assumption that it’s possible for developers to build systems with nearly no bugs—that defects are the result of fixable blind spots, not a lack of manual testing. So the techniques are geared around finding surprises and testing hypotheses.

The most common alternative is traditional testing: building repeatable test plans that comprehensively validate the system. Although this may seem more reliable, those test plans have blind spots of their own. Most of the tests end up being redundant to the tests programmers create with test-driven development. At best, they tend to find the same sorts of issues that exploratory testing does, at much higher cost, and they rarely expose problems that the other techniques reveal.

In terms of experimentation, the techniques I’ve described are just the beginning. The underlying idea is to validate your hidden assumptions. Anything you can do to identify and test those assumptions is fair game. One additional technique you can explore is called fuzzing. It involves generating large amounts of inputs and monitoring for unexpected results.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: System Architecture

Teams that practice evolutionary design start with “the simplest thing that could possibly work” and evolve their design from there. But what about the components that make up a deployed system? Applications and services, network gateways and load balancers, and even third-party services? Those components and interactions form your system architecture. Is it possible for them to start simple and evolve from there? Doc Norton joins us to explore this question further.

Doc Norton is passionate about working with teams to improve delivery and building great organizations. Once a dedicated code slinger, Doc has turned his energy toward helping teams, departments, and companies work better together in the pursuit of better software. Doc is co-founder of OnBelay, a consultancy that focuses on helping companies get better at delivering software.

Reading:
📖 Evolutionary System Architecture

🎙 Discussion prompts:

  • How have you seen teams approach designing system architecture?

  • Architecture is as much a factor of organizational design as system design, as Conway’s Law reminds us. How have you seen organizational structures affect system architecure, either for good or for ill?

  • It’s often easier to make overly-complex architectures than simple ones. What are some examples of complex architectures that could have been simplified?

  • What are your preferred techniques for improving an existing system architecture?

About the Book Club

The Art of Agile Development Book Club takes place Fridays from 8:00 – 8:45am Pacific. Each session uses an excerpt from the new edition of my book, The Art of Agile Development, as a jumping-off point for a wide-ranging discussion about Agile ideas and practices.

Visit the event page for more information, including an archive of past sessions. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: No Bugs

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

📖 The full text of this section is available below, courtesy of the Art of Agile Development book club! Join us on Fridays from 8-8:45am Pacific for wide-ranging discussions about Agile. Details here.

No Bugs

Audience
Whole Team

We release with confidence.

If you’re on a team with a bug count in the hundreds or thousands, the idea of “no bugs” probably sounds ridiculous. I’ll admit it: no bugs is an ideal to strive for, not something your team will completely achieve. There will always be some bugs. (Or defects; I use “bug” and “defect” interchangeably.)

But you can get closer to the “no bugs” ideal than you might think. Consider Nancy van Schooenderwoert’s experience with Extreme Programming. She led a team of novices working on a real-time embedded system for farm combines: a concurrent system written in C, with some assembly. If that’s not a recipe for bugs, I don’t know what is. According to her analysis of data by Capers Jones, the average team developing this software would produce 1,035 defects and deliver 207 to the customer.

Here’s what actually happened:

The GMS team delivered this product after three years of development, having encountered a total of 51 defects during that time. The open bug list never had more than two items at a time. Productivity was measured at almost three times the level for comparable embedded software teams. The first field test units were delivered after approximately six months into development. After that point, the software team supported the other engineering disciplines while continuing to do software enhancements. [VanSchooenderwoert2006]

“Embedded Agile Project by the Numbers with Newbies”

Over three years, the team generated 51 defects and delivered 21 to its customer. That’s a 95% reduction in generated defects and a 90% reduction in delivered defects.

We don’t have to rely on self-reported data. QSM Associates is a well-regarded company that performs independent audits of software development teams. In an early analysis of a company practicing a variant of XP, they reported an average reduction from 2,270 defects to 381 defects, an 83% decrease. Furthermore, the XP teams delivered 24% faster with 39% fewer staff. [Mah2006]

More recent case studies confirmed those findings. QSM found 11% defect reduction and 58% schedule reduction on a Scrum team; 75% defect reduction and 53% schedule reduction on an XP team; and 75% defect reduction and 30% schedule reduction in a multiteam analysis of thousands of developers. [Mah2018]

Eliminate errors at their source rather than finding and fixing them after the fact.

How do you achieve these results? It’s a matter of building quality in, rather than testing defects out. Eliminate errors at their source rather than finding and fixing them after the fact.

Don’t Play the Bug Blame Game

Is it a bug or a feature?

I’ve seen companies waste inordinate amounts of time on this question. In an attempt to apportion blame “correctly,” they make elaborate distinctions between bugs, defects, errors, issues, anomalies, and, of course...unintentional features.

What really matters is whether you will do or not do something.

None of that matters. What really matters is whether you will do or not do something. If there’s something your team needs to do—whatever the reason—it’s a story in your plan.

For the purposes of this chapter, I’m defining bugs as follows:

A bug is anything your team considers “done” that later needs correction.

For your purposes, though, even that distinction doesn’t matter. If something needs work, it gets a story card. That’s all there is to it.

How to Build Quality In

The better your internal quality, the faster you go.

Before I describe how to build quality in, I need to clarify what I mean by “quality.” Roughly speaking, quality can be divided into “internal quality” and “external quality.” Internal quality is the way your software is constructed. It’s things like good names, clear software design, and simple architecture. Internal quality controls how easy your software is to extend, maintain, and modify. The better the internal quality, the faster you go.

External quality is the user-visible aspects of your software. It’s your software’s UX, functionality, and reliability. You can spend infinite amounts of time on these things. The right amount of time depends on your software, market, and value. Figuring out the balance is a question of product management.

“Building quality in” means keeping internal quality as high as possible while keeping external quality at the level needed to satisfy your stakeholders. That involves keeping your design clean, delivering the stories in your plan, and revising your plan when your external quality falls short of what’s needed.

Now, let’s talk about how to do it. To build quality in and achieve zero bugs, you’ll need to prevent four types of errors.

Prevent programmer errors

Programmer errors occur when a programmer knows what to program, but makes a mistake. It could be an incorrect algorithm, a typo, or some other mistake made while translating ideas to code.

Allies
Test-Driven Development
Energized Work
Pair Programming
Mob Programming
Alignment
Done Done

Test-driven development is your defect-elimination workhorse. Not only does it ensure that you program what you intended to, it also gives you a comprehensive regression suite you can use to detect future errors.

To enhance the benefits of test-driven development, work sensible hours and use pair programming or mobbing to bring multiple perspectives to bear on every line of code. This improves your brainpower, which helps you make fewer mistakes and allows you to see mistakes more quickly.

Supplement these practices with good standards (which are part of your alignment discussion) and a “done done” checklist. These will help you remember and avoid common mistakes.

Prevent design errors

Design errors create breeding grounds for bugs. According to Barry Boehm, 20% of the modules in a program are typically responsible for 80% of the defects. [Boehm1987] It’s an old statistic, but it matches my experience with modern software, too.

Even with test-driven development, design errors will accumulate over time. Sometimes a design that looks good when you first create it won’t hold up over time. Sometimes a shortcut that seems like an acceptable compromise will come back to bite you. Sometimes your requirements change and your design no longer fits.

Allies
Collective Code Ownership
Simple Design
Incremental Design
Reflective Design
Slack

Whatever the cause, design errors manifest as complicated, confusing code that’s hard to get right. Although you could take a week or two off to fix these problems, it’s better to continuously improve your internal quality.

Use collective code ownership to give programmers the right and responsibility to fix problems wherever they live. Use evolutionary design to continuously improve your design. Make time for improvements by including slack in your plans.

Prevent requirements errors

Requirements errors occur when a programmer creates code that does exactly what they intended it to do, but their intention was wrong. Perhaps they misunderstood what they were supposed to do, or perhaps nobody really understood what needed to be done. Either way, the code works, but it doesn’t do the right thing.

Allies
Whole Team
Purpose
Context
Team Room
Ubiquitous Language
Customer Examples
Incremental Requirements
Stakeholder Demos
Stories
Done Done

A cross-functional, whole team is essential for preventing requirements errors. Your team needs to include on-site customers with the skills to understand, decide, and explain the software’s requirements. Clarifying the team’s purpose and context is vital to this process.

A shared team room is also important. When programmers have a question about requirements, they need to be able to turn their head and ask. Use a ubiquitous language to help programmers and on-site customers understand one another, and supplement your conversations with customer examples.

Confirm that the software does what it needs to do with frequent customer reviews and stakeholder demos. Perform those reviews incrementally, as soon as programmers have something to show, so misunderstandings and refinements are discovered early, in time to be corrected. Use stories to focus the team on customers’ perspective. Finally, don’t consider a story “done done” until on-site customers agree it’s done.

Prevent systemic errors

If everyone does their job perfectly, these practices yield software with no defects. Unfortunately, perfection is impossible. Your team is sure to have blind spots: subtle areas where team members make mistakes, but they don’t know it. These blind spots lead to repeated, systemic errors. They’re “systemic” because they’re a consequence of your entire development system: your team, its process, the tools you use, the environment you work in, and more.

Escaped defects are a clear signal of problems in paradise. Although errors are inevitable, most are caught quickly. Defects found by end users have “escaped.” Every escaped defect indicates a need to improve your development system.

Ally
Blind Spot Discovery

Of course, you don’t want your end users to be your beta testers. That’s where blind spot discovery comes in. It’s a variety of techniques, such as chaos engineering and exploratory testing, for finding gaps in your understanding. I discuss them in the next practice.

Some teams use these techniques to check the quality of their software system: they’ll code a story, search for bugs, fix them, and repeat. But to build quality in, treat your blind spots as a clue about how to improve your development system, not just your software system. The same goes for escaped defects. They’re all clues about what to improve.

Ally
Incident Analysis

Incident analysis helps you decipher those clues. No matter the impact, if your team thought something was done and it later needs fixing, it can benefit from incident analysis. This applies to well-meaning mistakes, too: if everybody thinks a particular new feature is a great idea, and it turns out to enrage your customers, it deserves just as much analysis as a production outage.

When you find a bug, write a test and fix the bug, but then fix the underlying system. Even if it’s just in the privacy of your thoughts, think about how you can improve your design and process to prevent that type of bug from happening again.

Fix Bugs Immediately

Do. Or do not. There is no //TODO.

As the great Master Yoda never said, “Do. Or do not. There is no //TODO.”

Each defect is the result of a flaw that’s likely to breed more mistakes. Improve quality and productivity by fixing them right away.

Allies
Collective Code Ownership
Team Room

Fixing bugs quickly requires the whole team to participate. Programmers, use collective code ownership so anyone can fix each bug. Customers and testers, personally bring new bugs to the attention of a programmer and help them reproduce it. This is easiest when the team shares a team room.

In practice, it’s not possible to fix every bug right away. You may be in the middle of something else when you learn about a bug. When this happens to me, I ask my navigator to make a note. We come back to it 10–20 minutes later, when we come to a good stopping point.

Allies
Slack
Task Planning
Visual Planning

Some bugs are too big to fix quickly. For these, I gather the team for a quick huddle. We collectively decide if we have enough slack to fix the bug and still meet our other commitments. If we do, we create tasks for the bug, put them in our plan, and people volunteer for them as normal. (If you’re using estimates, these tasks don’t get estimates or count toward your capacity.)

If there isn’t enough slack to fix the bug, decide as a team whether it’s important enough to fix before your next release. If it is, create a story for it and schedule it immediately for your next iteration or story slot. If it isn’t, add it to your visual plan in the appropriate release.

Bugs that aren’t important enough to fix should be discarded. If you can’t do that, the bug needs to be fixed. The “fix,” though, can be a matter of documenting a workaround, or making a record that you decided not to fix the bug. An issue tracker might be the right way to do this.

Testers’ Role

Because fluent Delivering teams build quality in, rather than testing defects out, people with testing skills shift left. Instead of focusing their skills on the completed product, they focus on helping the team build a quality product from the beginning.

In my experience, some testers are business-oriented: they’re very interested in getting business requirements right. They work with on-site customers to uncover all the nit-picky details the customers would otherwise miss. They’ll often prompt people to think about edge cases during requirements discussions.

Other testers are more technically-oriented. They’re interested in test automation and nonfunctional requirements. These testers act as technical investigators for the team. They create the testbeds that look at issues such as scalability, reliability, and performance. They review logs to understand how the software system works in production. Through these efforts, they help the team understand the behavior of its software and decide when to devote more effort to operations, security, and nonfunctional stories.

Ally
Blind Spot Discovery

Testers also help the team identify blind spots. Although anybody on the team can use blind spot discovery techniques, people with testing skills tend to be particularly good at it.

’Tude

Bugs are for other people.

I encourage an attitude among my teams—a bit of eliteness, even snobbiness. It goes like this: “Bugs are for other people.”

If you do everything I’ve described, bugs should be a rarity. Your next step is to treat them that way. Rather than shrugging your shoulders when a bug occurs—“Oh yeah, another bug, that’s what happens in software”—be shocked and dismayed. Bugs aren’t something to be tolerated; they’re a sign of underlying problems to be solved.

Allies
Pair Programming
Mob Programming
Team Room
Collective Code Ownership

Ultimately, “no bugs” is about establishing a culture of excellence. When you learn about a bug, fix it right away, then figure out how to prevent that type of bug from happening again.

You won’t be able to get there overnight. All the practices I’ve described take discipline and rigor. They’re not necessarily difficult, but they break down if people are sloppy or don’t care about their work. A culture of “no bugs” helps the team maintain the discipline required, as do pairing or mobbing, a team room, and collective ownership.

You’ll get there eventually. Agile teams can and do achieve nearly zero bugs. You can too.

Questions

How do we prevent security defects and other challenging bugs?

Allies
Done Done
Alignment
Blind Spot Discovery

Threat modeling (see the “Threat Modeling” section) can help you think of security flaws in advance. Your “done done” checklist and coding standards can remind you of issues to address. That said, you can only prevent bugs you think to prevent. Security, concurrency, and other difficult problem domains may introduce defects you never considered. That’s why blind spot discovery is also important.

How should we track our bugs?

You shouldn’t need a bug database or issue tracker for new bugs, assuming your team isn’t generating a lot of bugs. (If you are, focus on solving that problem first.) If a bug is too big to fix right away, turn it into a story, and track its details in the same way you handle other requirements details.

How long should we work on a bug before we turn it into a story?

Ally
Slack

It depends on how much slack you have. Early in an iteration, when there’s still a lot of slack, I might spend half a day on a defect before turning it into a story. Later, when there’s less slack, I might only spend 10 minutes on it.

We have a lot of legacy code. How can we adopt a “no bugs” policy without going mad?

It will take time. Start by going through your bug database and identifying the ones you want to fix in the current release. Schedule at least one to be fixed every week, with a bias toward fixing them sooner rather than later.

Ally
Incident Analysis

Every week or two, randomly choose a recent bug to subject to an incident analysis, or at least an informal one. This will allow you to gradually improve your development system and prevent bugs in the future.

Prerequisites

“No bugs” is about a culture of excellence. It can only come from within the team. Managers, don’t ask your teams to report defect counts, and don’t reward or punish them based on the number of defects they have. You’ll just drive the bugs underground, and that will make quality worse, not better. I’ll discuss this further in the “Incident Accountability” section.

Achieving the “no bugs” ideal depends on a huge number of Agile practices—essentially, every Focusing and Delivering practice in this book. Until your team reaches fluency in those practices, don’t expect dramatic reductions in defects.

Conversely, if you’re using the Focusing and Delivering practices, more than a few new bugs per month may indicate a problem with your approach. You’ll need time to learn the practices and refine your process, of course, but you should see an improvement in your bug rates within a few months. If you don’t, check the “Troubleshooting Guide” sidebar.

Indicators

When your team has a culture of “no bugs”:

  • Your team is confident in the quality of its software.

  • You’re comfortable releasing to production without a manual testing phase.

  • Stakeholders, customers, and users rarely encounter unpleasant surprises.

  • Your team spends its time producing great software instead of fighting fires.

Alternatives and Experiments

One of the revolutionary ideas Agile incorporates is that low-defect software can be cheaper to produce than high-defect software. This is made possible by building quality in. To experiment further, look at the parts of your process that check quality at the end, and think of ways to build that quality in from the beginning.

You can also reduce bugs by using more and higher quality testing to find and fix a higher percentage of bugs. However, this doesn’t work as well as building quality in from the beginning. It will also slow you down and make releases more difficult.

Some companies invest in separate QA teams in an effort to improve quality. Although occasional independent testing can be useful for discovering blind spots, a dedicated QA team isn’t a good idea. Paradoxically, it tends to reduce quality, because the development team then spends less effort on quality themselves. Elisabeth Hendrickson explores this phenomenon in her excellent article, “Better Testing, Worse Quality?” [Hendrickson2000]

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

AoAD2 Chapter: Quality (introduction)

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

📖 The full text of this section is available below, courtesy of the Art of Agile Development book club! Join us on Fridays from 8-8:45am Pacific for wide-ranging discussions about Agile. Details here.

Quality

For many people, “quality” means “testing,” but Agile teams treat quality differently. Quality isn’t something you test for; it’s something you build in. Not just into your code, but into your entire development system: the way your team approaches its work, the way people think about mistakes, and even the way your organization interacts with your team.

This chapter has three practices to help your team dedicate itself to quality:

  • The “No Bugs” practice builds quality in.

  • The “Blind Spot Discovery” practice helps team members learn what they don’t know.

  • The “Incident Analysis” practice focuses your team on systemic improvements.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: Agile Management

Agile teams are self-organizing and self-directed. What, then, is the role of management? What do Agile managers do, and how do they help their teams excel? In this session, Johanna Rothman and Elisabeth Hendrickson help us find the answers.

Johanna Rothman is known as the “Pragmatic Manager.” She helps leaders and teams see simple and reasonable alternatives that might work in their context—often with a bit of humor. She’s the author of many books about management, including the Modern Management Made Easy series.

Elisabeth Hendrickson has led teams of all sizes, founded companies as an entrepreneur, and was VP R&D at Pivotal leading development of data products and PKS, a Kubernetes distribution. She's a sought-after expert, especially in the area of software process and quality. Her latest venture is Curious Duck Digital Laboratory, LLC where she is developing a simulation to give technologists insight into the non-linear nature of software development.

Reading:
📖 Management

🎙 Discussion prompts:

  • The book makes the case that Theory Y management is necessary for Agile to thrive. Is this true? If so, what should people in Theory X organizations do?

  • What does effective Agile management look like in practice?

  • How can an Agile manager evaluate their team’s performance, and what should they do to help them improve?

  • Managers often have to fit into a larger system that has trouble with Agile ideas. What are some tricks for making Agile work in a non-Agile organization?

About the Book Club

The Art of Agile Development Book Club takes place Fridays from 8:00 – 8:45am Pacific. Each session uses an excerpt from the new edition of my book, The Art of Agile Development, as a jumping-off point for a wide-ranging discussion about Agile ideas and practices.

Visit the event page for more information, including an archive of past sessions. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: Impediment Removal

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Impediment Removal

Audience
Whole Team

by Diana Larsen

We fix the problems that slow us down.

Impediments. Blockers. Obstacles, barriers, hindrances, snags, threatening risks (also known as looming future impediments). All words describing issues that can derail team performance. They may be obvious. “The network is down.” They may be subtle. “We misunderstood the customers’ needs and have to start over.” Or, “We’re stuck!”

Some impediments hide in plain sight. Some emerge from a complex situation. Some are the symptom of a larger issue, and some don’t have a single root cause, but are a many-headed hydra. Some are an unstoppable force, such as bad weather, with the weight of culture and tradition behind them. And some, the most precious of all, are in your control and easily resolved.

Regardless of their source, impediments hinder the team and can even bring progress to a full stop. Impediment removal gets the team back up to speed.

Removing impediments is a team responsibility.

Some team members expect people with leadership titles to take on impediment removal, but removing impediments is a team responsibility. Don’t wait for your coach or manager to notice and solve your team’s impediments. Take care of them yourself.

Allies
Stand-Up Meetings
Retrospectives
Task Planning

Similarly, some teams create impediment or risk boards to keep track of everything that’s in their way. I don’t recommend it. Instead, address impediments as soon as you recognize them. Bring them up in your next stand-up, retrospective, or task planning session, and decide how you’ll overcome each one.

...to continue reading, buy the book!

In this Section

  1. Impediment Removal
    1. Identifying Impediments
    2. Circles and Soup
      1. Control: Take direct action
      2. Influence: Persuade or recommend
      3. Soup: Change your response
    3. Questions
    4. Prerequisites
    5. Indicators
    6. Alternatives and Experiments
    7. Further Reading

Discuss the book on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

AoAD2 Practice: Retrospectives

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Retrospectives

Audience
Whole Team

We continually improve our work habits.

Your team should constantly update and improve your development process. Retrospectives are a great way to do so.

...to continue reading, buy the book!

In this Section

  1. Retrospectives
    1. Key Idea: Continuous Improvement
    2. Types of Retrospectives
    3. How to Conduct a Heartbeat Retrospective
    4. Step 1: The Prime Directive (5 minutes)
    5. Step 2: Brainstorming (20 minutes)
    6. Step 3: Mute Mapping (15 minutes)
    7. Step 4: Generate Insights (10–30 minutes)
    8. Step 5: Retrospective Objective (10–20 minutes)
    9. Follow Through
    10. Questions
    11. Prerequisites
    12. Indicators
    13. Alternatives and Experiments
    14. Further Reading

Discuss the book on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

AoAD2 Chapter: Improvement (introduction)

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

📖 The full text of this section is available below, courtesy of the Art of Agile Development book club! Join us on Fridays from 8-8:45am Pacific for wide-ranging discussions about Agile. Details here.

Improvement

At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

Manifesto for Agile Software Development

Feedback and adaptation are central to Agile, and that applies to the team’s approach to Agile itself. Although you might start with an off-the-shelf Agile method, every team is expected to customize its method for itself.

As with everything else in Agile, this customization happens through iteration, reflection, and feedback. Emphasize the things that work; improve the things that don’t. These practices will help you do so:

  • The “Retrospectives” practice helps your team continually improve.

  • The “Team Dynamics” practice improves your team’s ability to work together.

  • The “Impediment Removal” practice focuses your team’s improvement efforts where they’ll make the most difference.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. Or come to the weekly book club!

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: Forecasting & Roadmaps

“When will you be done?” It’s a question programmers have come to dread. Software development has so many details, it’s impossible to know exactly what’s left to do, let alone how long it will take. Yet stakeholders have a genuine need to know. Forecasting and roadmaps are ways to provide them with what they need, and the focus of this book club session.

In this session, we’re joined by Todd Little, the author of the noteworthy IEEE Software article, “Schedule Estimation and Uncertainty Surrounding the Cone of Uncertainty,” which is the basis of the date and scope forecasting technique described in The Art of Agile Development. Todd was a founder of the Agile 20XX conference series and past member of the Board of Directors of the Agile Alliance. He is currently Chairman of Kanban University.

Reading:
📖 Forecasting
📖 Roadmaps

🎙 Discussion prompts:

  • Love ’em or hate ’em, forecasting and roadmaps are a common feature of software development. Tell us a story about when they’ve gone particularly well... or particularly poorly.

  • The book recommends steering your plans to meet predefined release dates rather than making date and scope forecasts. What are some pros and cons of this approach?

  • What have you done when your team’s forecast hasn’t matched your stakeholders expectations? How did it turn out?

  • What are your preferred approaches to sharing roadmaps with managers, stakeholders, and customers? How do electronic tracking tools such as Jira fit into it?

About the Book Club

The Art of Agile Development Book Club takes place Fridays from 8:00 – 8:45am Pacific. Each session uses an excerpt from the new edition of my book, The Art of Agile Development, as a jumping-off point for a wide-ranging discussion about Agile ideas and practices.

Visit the event page for more information, including an archive of past sessions. For more about the book, visit the Art of Agile Development home page.