Boundary Objects Discussion on the Oddly Influenced Podcast

Brian Marick had me on his Oddly Influenced podcast to talk about boundary objects.

Boundary objects, if you’re not familiar with them, are a tool for creating shared understanding. They’re a concrete model that people with different perspectives use to combine their views. I first heard about them from Brian many years ago, and now I use them as a regular part of my practice.

I love this interview because it’s so different than a typical Agile discussion. Boundary objects are simultaneously academic and obscure; and immensely pragmatic and useful. We had a fantastic discussion about how boundary objects work in practice. I highly recommend it.

You can listen to the episode here or view the transcript here.

Agile Book Club: Optimizing Outcomes

In its full glory, Agile is a world in which teams twirl and dance in response to changing market conditions. They experiment and learn; develop new markets; outmaneuver the competition. In the Agile Fluency Model, this is called Optimizing fluency. In this, our final book club session, Mary and Tom Poppendieck join us to discuss how to achieve it.

Mary and Tom Poppendieck wrote the classic book Lean Software Development in 2003, outlining the application of Lean principles to software engineering. Subsequent books include Implementing Lean Software Development, published in 2006, Leading Lean Software Development in 2009, and The Lean Mindset in 2013. Over the past two decades they have extended these ideas at www.leanessays.com.

Reading:
📖 Optimizing Outcomes
📖 Autonomy
📖 Discovery
📖 Into the Future

🎙 Discussion prompts:

  • Optimizing outcomes depends on a truly cross-functional team with a clear purpose and ownership over its business decisions. What does that look like in practice?

  • What does it mean for a team to have financial responsibility, and how can organizations make that work?

  • Feedback and learning is crucial for success, and that requires teams to have a direct connection to their customers and stakeholders. How can teams make the most of that connection?

  • Let’s talk about options. How can teams use options and adaptability to improve their outcomes?

About the Book Club

From October 2021 to August 2022, I hosted a call-in talk show based on the second edition of The Art of Agile Development. The series used the book as a jumping-off point for wide-ranging discussions about Agile ideas and practices, and had a star-studded guest list.

For an archive of past sessions, visit the book club index. For more about the book, visit the Art of Agile Development home page.

Agile Book Club: Incident Analysis

Despite your best efforts, your software will sometimes fail to work as it should. Some failures will be minor; others will be more significant. Either way, once the dust has settled, you need to figure out what happened and how you can improve. This is incident analysis.

Reading:
📖 Incident Analysis

🎙 Discussion prompts:

  • Think of an incident or bug you were involved with. What happened? Share the story.

  • When you look back at that incident/bug, what were some of its causes?

  • What aspects of your development system contributed to the incident or bug? Which aspects made it less harmful?

  • How could your development system be improved to make incidents and bugs less likely in the future?

About the Book Club

From October 2021 to August 2022, I hosted a call-in talk show based on the second edition of The Art of Agile Development. The series used the book as a jumping-off point for wide-ranging discussions about Agile ideas and practices, and had a star-studded guest list.

For an archive of past sessions, visit the book club index. For more about the book, visit the Art of Agile Development home page.

FAST: An Innovative Way to Scale

I’ve given a few talks on FAST (Fluid Scaling Technology) this year. They cover the same material, but in varying levels of depth and with different audience questions.

The recordings are below. Here’s the abstract:

How can multiple teams work together on a single product? The common wisdom is to carefully align teams and responsibilities to create autonomous teams. But, invariably, this approach eventually runs into cross-team bottlenecks, challenges aligning responsibilities to teams, and difficulties creating cross-functional teams.

Fluid Scaling Technology, or FAST, is an innovative new approach that solves these problems. It uses frequent team self-selection to prevent bottlenecks, share knowledge amongst teams, and ensure the right people are working on the right things. It’s simple and lightweight.

Join James Shore as he shares his experiences with scaling Agile—first with traditional approaches, and more recently with FAST. Learn what works, what doesn’t, and how you can try FAST in your organization.

Agile 2022

I spoke at the Agile 2022 conference on July 21st. This was an in-person session and 75 minutes long. The recording is available to Agile Alliance members. Scroll down towards the bottom, or search for “James Shore” on the page.

A picture of James Shore presenting “FAST: An Innovative Way to Scale” at Agile 2022.

Agile New England

I presented the session to Agile New England on July 14th. This was an online session and 90 minutes long. The recording is here. You’ll have to create an account, but it’s free.

A picture of James Shore presenting “FAST: An Innovative Way to Scale” at Agile New England.

Agile Book Club: Team Dynamics

Team dynamics form the bedrock of Agile teams’ ability to develop and deliver software. They’re the invisible undercurrents that determine your team’s culture. In this session, Diana Larsen and Linda Rising help us explore the dynamics that make and break Agile teams.

Diana Larsen is a team dynamics expert and change guru who’s been part of the Agile community since the beginning. She’s best known for coauthoring Agile Retrospectives: Making Good Teams Great with Esther Derby. She authored the section on team dynamics in The Art of Agile Development.

Linda Rising describes herself as “incredibly old but still agile and interested in almost anything—including bonobos and trees!” She's been working on teams her entire life, starting with pick-up softball games where she pitched and hit fungos to her two younger brothers.

Reading:
📖 Team Dynamics

🎙 Discussion prompts:

  • The Tuckman Model (“Forming,” “Storming,” “Norming,” and “Performing”) is a classic lens into team dynamics. What do you see as the strengths and weaknesses of that model?

  • Let’s talk about trust. What role does trust play in a healthy team, and how can it be created?

  • Agile teams self-organize and share leadership. What does that look like in practice? How can people develop their shared leadership skills?

  • What role do managers play in creating healthy team dynamics?

About the Book Club

From October 2021 to August 2022, I hosted a call-in talk show based on the second edition of The Art of Agile Development. The series used the book as a jumping-off point for wide-ranging discussions about Agile ideas and practices, and had a star-studded guest list.

For an archive of past sessions, visit the book club index. For more about the book, visit the Art of Agile Development home page.

AoAD2 Chapter: Into the Future

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Into the Future

Agile teams never stop learning, experimenting, and improving. The practices in this book are only the starting point. Once you understand a practice, make it yours! Experiment with alternatives and seek out new ideas. As you become more fluent, deliberately break the rules and see what happens. You’ll learn why the rules exist...and what their limits are.

What comes after that? That’s for you to decide. Agile is always customized to the needs of the team.

In the Agile Fluency Model, Diana Larsen and I identified a possible fourth zone: Strengthening. If you look carefully, each zone represents a different expansion of the team’s circle of control: Focusing gives the team ownership of its tasks; Delivering gives it ownership of its releases; Optimizing gives it ownership of its product.

Strengthening continues this trend by expanding teams’ ownership over organizational strategy. People don’t just make decisions focused on their teams; they come together to make decisions affecting many teams. One example that’s starting to enter the mainstream is team self-selection. In team self-selection, team members decide for themselves which team they’ll be part of, rather than being assigned by management.

Sound crazy? It’s not. It’s carefully structured, not a free-for-all. (See [Mamoli2015] for details.) I’ve used team self-selection myself and it’s surprisingly effective. The results are better than I’ve seen from traditional manager-driven selection. It leads to teams that are highly productive out of the gate.

The Strengthening zone is about this sort of bottom-up decision making. Governance approaches such as Sociocracy and Holacracy are experimenting in this space, as are companies such as Valve Software, Semco, and W. L. Gore & Associates. Jutta Eckstein and John Buck’s book Company-wide Agility with Beyond Budgeting, Open Space & Sociocracy [Eckstein2020] goes into more detail. For a lighter-weight introduction to the philosophy, see Ricardo Semler’s Maverick. [Semler1995] It’s a fascinating account of the author’s revitalization of his company’s management approach.

That said, the Agile Fluency Model has never been a maturity model. You’re not required to pass through the zones in order, or to achieve fluency in every zone. Although individual practices, such as team self-selection, have their place, I suspect full Strengthening fluency is inappropriate for most companies. But if you want to live on the cutting edge and join the ranks of the innovators who made Agile what it is today, the Strengthening zone is one place to start. Beyond that...who knows? There are additional zones waiting to be discovered.

Ultimately, though, Agile doesn’t matter. Really! What matters is success, for your team members, organization, and stakeholders, in whatever way they define it. Agile practices, principles, and ideas are merely guides along the way. Start by following the practices rigorously. Learn how to apply the principles and key ideas. Break the rules, experiment, see what works, and learn some more. Share your insights and passion, and learn even more.

Over time, with discipline and experience, the practices and principles will become less important. When doing the right thing is a matter of instinct and intuition, finely honed by experience, it’s time to leave rules and principles behind. It won’t matter what you call it. When your intuition leads to great software that serves a valuable purpose, and your wisdom inspires the next generation of teams, you will have mastered the art of Agile development.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.

AoAD2 Chapter: Discovery

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Discovery

Optimizing teams make their own product decisions. How do they know what to build?

Ally
Whole Team

Partly, they know what to build because they include people with product expertise. Those team members have the background and training to decide what to do.

But the fact is, at least at the beginning of a new product, nobody is 100% sure what to do. Some people pretend to know, but Optimizing teams don’t. Their ideas are, at best, very good guesses about what will lead to success.

The job of the Optimizing team isn’t to know what to build, but to discover what to build.

So the job of the Optimizing team isn’t to know what to build, but to discover what to build. Steve Blank, whose work was the basis for the Lean Startup movement, put it this way:

[T]he task is unambiguous—learn and discover what problems customers have, and whether your product concept solves that problem; understand who will buy it; and use that knowledge to build a sales roadmap so a sales team can sell it to them. And [you] must have the agility to move with sudden and rapid shifts based on what customers have to say and the clout to reconfigure [your team] when customer feedback requires it. [Blank2020a] (app. A)

Steve Blank, The Four Steps to the Epiphany

Steve Blank was talking about startups, but this quote applies equally well to Optimizing teams. Even if you aren’t selling your software! No matter who your customers and users are—even if they’re Keven and Kyla, who sit in the next cubicle over—your job is to figure out how to bring them value. And, just as importantly, how to do so in a way they will actually buy or use.

Validated Learning

I can’t count the number of times I’ve had a good idea, put it in front of real customers or users, and found out that it didn’t work out. Sure, they would tell me they loved the idea when I told them about it. Sometimes, even after they tried a prototype! It was only when I asked people to make a real expenditure—of time, money, or political capital—that I learned my “good idea” wasn’t good enough.

Product ideas are like a perpetual motion machine: if you believe hard enough, and have enough inertia, they look like they’ll last forever. Put a real load on them, though, and they grind to a halt.

Allies
Blind Spot Discovery
Real Customer Involvement

Validated learning is one of your best tools for testing ideas. I discussed it in the “Validated Learning” section, but to recap, validated learning involves making a hypothesis about your market, building something you can put in front of them, and measuring what happens. Use what you’ve learned to adjust your plans, then repeat. This is often referred to as the Build-Measure-Learn loop.

To truly validate your learning, you need real customers (or users) and real costs. If you show what you’ve built to people who aren’t part of your target market, you’ll get feedback, but it might not be relevant to your actual situation. And if you don’t ask them to commit something in exchange, you’ll learn more about people’s desire to avoid hurting your feelings than about the actual value of your idea. Everybody will praise your idea for a luxury vacation...until you ask them for their down payment.1

1Then it’s all, “Oh, I don’t have time,” “I couldn’t leave my chihuahua Fluffles all alone,” and “I hate tropical sand. It’s rough and irritating, and it gets everywhere.”

Adaptability

Ally
Adaptive Planning

Every time you go through the Build-Measure-Learn loop, you’ll learn something new. To take advantage of what you learned, you’ll have to change your plans. As a result, Optimizing teams tend to keep their planning horizons short and their plans adaptable. They keep their valuable increments small so they can change direction without waste.

Valuable increments (see the “Valuable Increments” section) aren’t just about features and capabilities. Remember, there are three common categories of value:

  • Direct value. You’ve built something that provides one of the types of value described in the “What Do Organizations Value” sidebar.

  • Learning value. You’ve built something that helps you understand your market and future prospects better.

  • Option value. You’ve built something that allows you to change direction for less cost.

For Optimizing teams, learning and options are just as important as direct value. In the beginning, they can even be more important than direct value, because they allow the team to avoid wasting time building the wrong things. Every Build-Measure-Learn loop is an example of a “learning value” increment.

Options thinking is also common in Optimizing teams. The future is uncertain, and no plans are set in stone, so Optimizing teams ensure they have the ability to adapt. They do so by thinking about future possibilities and building “option value” increments. A prospective analysis, described in the “Prospective Analysis” section, is one way to start identifing those options.

Options are also an important technique for managing risk. If your prospective analysis shows a substantial risk—for example, a competitor providing a less lucrative, but more attractive pricing model—you could build an option that allowed you to change your pricing model with the flip of a switch.

Another sort of option involves deadlines. Although Optimizing teams avoid arbitrary deadlines, sometimes value depends on releasing before a certain date. For example, video games need to be delivered in time for the holiday season, tax software needs to be updated yearly, and new regulations can have strict deadlines with harsh compliance penalties.

To meet these deadlines, Optimizing teams will often build a “safety” increment before embarking on a more ambitious idea. The “safety” increment fulfills the demands of the deadline, in a minimal way, leaving the team free to work on its more ambitious ideas without worry. If those ideas doesn’t pan out, or can’t be completed in time, the team releases the “safety” increment instead.

For example, reviewer Bill Wake shared the (possibly apocryphal) story of a printer company that needed to deliver a red-eye removal feature for a new photo printer. The hardware had a strict release date, so the software team started with a primitive red-eye algorithm, then worked on a more sophisticated approach.

Experiments and Further Reading

There’s much, much more to deciding product direction than I can cover in this book. Opportunities for further reading abound; look in the product management category. Three places to start are Marty Cagan’s Inspired: How to Create Tech Products Customers Love [Cagan2017]; Luke Hohmann’s Innovation Games: Creating Breakthrough Products through Collaborative Play [Hohmann2006]; and David Bland and Alexander Osterwalder’s Testing Business Ideas [Bland2019].

The point to remember is that, in addition to normal product management, Optimizing teams engage with their customers to understand their market and validate their ideas. They exist to learn as much as they do to build, and the flexibility of their plans reflects that focus. The Lean Startup movement calls this customer discovery and customer validation.

For much more detail about these ideas, see The Startup Owner’s Manual. [Blank2020b] It’s an updated version of Steve Blank’s book, The Four Steps to the Epiphany. [Blank2020a] Blank’s ideas, combined with Extreme Programming, formed the basis of Eric Ries’s Lean Startup movement. [Ries2011]

As you can imagine, The Startup Owner’s Manual is focused on startups, so its advice will need customization to your situation, but Optimizing teams have a lot of similarities to startups. A successful Optimizing team isn’t just carrying on with the status quo. If it were, Focusing and Delivering fluency would be sufficient. Instead, it’s seeking ways to lead its market and develop new markets. Lean Startup ideas, including the foundational ideas of customer discovery and customer validation, are a key part of how you can do so.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.

AoAD2 Chapter: Autonomy

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Autonomy

Optimizing fluency is fairly rare, but it’s not because the Optimizing zone represents a big change in Agile practices. On the contrary: Optimizing is mostly an application of the practices found throughout the rest of this book. Optimizing fluency isn’t rare because it’s hard; it’s rare because it requires a level of team autonomy most organizations aren’t ready to support.

Optimizing requires a level of team autonomy most organizations aren’t ready to support.

Everybody knows Agile teams are supposed to be autonomous, but organizations with Optimizing teams really mean it. For them, autonomy is more than just enabling teams to work independently. They give their teams full responsibility for their finances and product plans, too.

Business Expertise

Ally
Whole Team

Of course, for your team to own its financial and product decisions, the team needs to have the ability to make good decisions. A whole team consisting of both business and development expertise has always been the goal, but many organizations short-change the business side of their teams. They assign a product manager who can participate only a few hours a week, or assign product “owners” who have no real decision-making authority. Some teams get the worst of both worlds: product owners who are spread too thin and have no decision-making authority.

Optimizing teams have real business authority and expertise. It’s not siloed behind a single person, either. Everybody on the team takes an interest in producing value. Some more than others, of course, but there’s no jealous hoarding of responsibility. You’ll get the best results when your entire team sees its job as learning how to better serve customers, users, and stakeholders.

Business Decisions

One of the most striking things about Optimizing teams is their lack of emphasis on user stories. They have stories, of course, as a planning mechanism, but they’re not the topic of their conversations with stakeholders. Instead, they’re all about business results and value. They’re not trying to deliver a set of stories; that’s a detail. They’re trying to make a meaningful difference to their organization.

Ally
Stakeholder Trust

This is particularly true of their relationship with management. Optimizing teams have the trust of their organization. Executives and managers know they can give the team funding and a mission, then stand back. The team will work out how to achieve the mission on its own. The team will let its executives know how the funding is being spent, what results it's achieving, and what support it needs to be more successful.

Ally
Adaptive Planning

One of the consequences of this approach is that Optimizing teams rarely follow a predetermined plan. In general, their valuable increments are small, their plans highly adaptive, and their planning horizons are short. Rather than working a big, static plan, they’re constantly testing ideas and making incremental progress. (At least from the perspective of internal stakeholders. They can still choose to save up work for a big splashy release.)

Ally
Forecasting

As a result, Optimizing teams tend not to have traditional deadlines or roadmaps. When they do set a deadline, it’s a choice they make for themselves. They do so because there’s a compelling business reason, such as coordinating with a marketing effort, not because it satisfies a bureaucratic requirement. If they realize they won’t be able to achieve a deadline, they decide for themselves how and when to change their plans.

Accountability and Oversight

Optimizing teams aren’t without oversight. They may have control over their budget and plans, but that doesn’t mean they get to do whatever they want. They still have to show their work and justify their big-picture decisions. They just don’t have to get advance approval for their decisions, so as long as they relate to the team’s purpose and don’t require additional resources from the organization.

Ally
Purpose

The organization uses the team’s purpose to put guide rails around the team’s work. The team’s purpose sets out the big-picture direction for the team (the vision), their current near-term goal (the mission), and the signposts that lead to success (the indicators). Management provides the general direction, and the team collaborates with them and other stakeholders to work out the details. When the team sees an opportunity to change its purpose to be more valuable, team members talk it over with management.

The team demonstrates its accountability, not by showing the stories it’s delivered, but by focusing on business results: both what it's achieved so far and what it hopes to achieve in the future. These results may be straightforward, such as revenue numbers, or more subtle, such as employee satisfaction scores. Either way, the emphasis is on outcomes, not deliverables and dates.

Allies
Stakeholder Demos
Roadmaps

Optimizing teams aren’t just trying to achieve short-term outcomes, though. They’re also constantly learning how to better serve their users and their market. So they also talk about what they’ve learned, what they want to learn next, and how they plan to do so. All this information is shared through the team’s internal demos, their internal roadmaps, and private conversations with management.

Funding

The team’s funding is another of the organization’s oversight mechanisms. Optimizing teams are typically funded on an ongoing “business as usual” basis (see the “Agile Governance” section). The organization allocates those funds based on the outcomes it expects from the team. The team can also procure one-off funds and resources by going to management with its justification.

Ally
Context

If team members don't think they can achieve their purpose with the funds and other resources they have, they can ask their sponsor for more. If the sponsor doesn’t agree, the team and their sponsor collaborate to find a balance that can be achieved, or the team pivots to a new, more valuable purpose. This discussion typically happens during context chartering.

As the team’s work progresses, the organization’s predictions about value will come true...or not. This is an opportunity to adjust the team’s purpose. If the team is producing more value than expected, the funding can be increased, and the team can double down on its successes. If it’s producing less, the funding can be decreased, or the team can pivot to a more valuable purpose.

Experiments and Further Reading

As I’ve mentioned, autonomy and ownership can be a difficult shift for organizations and managers. The Agile Culture: Leading through Trust and Ownership [Pixton2014] can help managers learn how to make this shift. Another option is Turn the Ship Around! A True Story of Turning Followers Into Leaders. [Marquet2013] It’s also a great read.

In terms of experiments, one of the most interesting is “Beyond Budgeting.” It has an emphasis on disseminating decision making to customer-focused teams, similar to what I’ve described here, but it goes into much more depth on the management side of things. To learn more, see Jeremy Hope and Robin Fraser’s book, Beyond Budgeting. [Hope2003]

The Agile community is full of other interesting ideas and experiments for improving autonomy. Many of these experiments push into the Strengthening zone of fluency. I touch upon them in the “Into the Future” chapter.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.

AoAD2 Part IV: Optimizing Outcomes (Introduction)

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Optimizing Outcomes

October has rolled around again. Last year, your team achieved Delivering fluency (see Part III). At the time, some team members wanted to push for Optimizing fluency, too, but management was skeptical. You couldn’t get the support you needed.

Since you’ve achieved Delivering fluency, though, your team has been firing on all cylinders. Productivity went way up; defects, way down. Hanna, your product manager, was having trouble keeping up. She delegated more and more responsibilities to the team, which rose to the challenge.

It got noticed. Hanna was singing your praises to the marketing director, and your boss was talking you up to the engineering director. The time was right to push for Optimizing fluency again. This time, it worked. Hanna was assigned to join your team full time. Not only that, she got permission to try “the Agile experiment.”

“The Agile experiment” is what they’re calling the way Hanna works with your team. Instead of having to go through a yearly planning exercise like the rest of Marketing, she got permission to own your team’s financials. She meets with her boss regularly to share statistics such as revenue and customer retention, and she’s constantly trying out new ideas and experiments. (Her colleagues are jealous. They still have to go through six weeks of budget and target-setting hell every year.)

It’s not just Hanna. The whole team is getting in on the action. Although Hanna is first among equals when it comes to product marketing expertise, other members of the team have developed their own areas of expertise. Shayna, in particular, loves visiting customer sites to see how people work.

Shayna’s just asked for the team’s attention. “I just finished a remote session with Magda,” she says. “You all remember Magda, right?” Nods all around. Magda is a developer who works for one of your new customers. Her company’s bigger than your normal customers, so they’ve been pretty demanding.

“Magda’s company has been dealing with an increasingly complex tax situation,” Shayna continues. “They have remote employees in more and more countries all over the world, and dealing with the various taxes and employment law is overwhelming. Magda’s heading up a team to automate some of that work, and she wanted to know how to integrate with our API.”

“But it got me thinking,” Shayna’s voice raises in excitement. “That isn’t too far off from what we do already. What if we sold an add-on module for international employment? It’s a lot of work, but we could start one country at a time. And Bo, you have some experience in this area, right?” Bo nods thoughtfully.

Hanna purses her lips. “It’s a big bet,” she says. “But it could have a huge pay-off. This could crack open the market for more companies like Magda’s. It would definitely widen our moat. None of our direct competitors have anything like that, and the big players charge two arms, a leg, and half your torso in professional services fees. Plus, we’re a lot more user-friendly.” She grins. It has a lot of teeth. “We’d only need to charge an arm and a leg. What do the rest of you think?”

Your team engages in a rapid-fire discussion of the idea. As you come to the consensus that it’s worth pursuing, Hanna nods sharply. “I love it. We’ll need to validate the market and figure out how to break it down into smaller bets. I’ll put a story on next week’s plan to come up with Build-Measure-Learn experiments. We can start on them after we release our current increment. In the meantime, I’ll do some research and run it by the boss. If the experiments work out, we’ll need her to approve more funding and a change to our mission.”

“Thanks, Shayna,” she finishes. “This is why I love being part of this team.”

Welcome to the Optimizing Zone

The Optimizing zone is for teams who want to create more value.

The Optimizing zone is for teams who want to create more value. They take ownership of their product plans and budget so they can experiment, iterate, and learn. This allows them to produce software that leads their market. Specifically, teams who are fluent at Optimizing:1

1These lists are derived from [Shore2018b].

  • Deliver products that meet business objectives and market needs. (Teams fluent in the other zones deliver what they’re asked to deliver, which isn’t necessarily the same.)

  • Include broad-based expertise that promotes optimal cost/value decisions.

  • Understand where their products stand in the market and how they’ll improve their position.

  • Coordinate with leadership to cancel or pivot low-value products early.

  • Learn from market feedback to anticipate customer needs and create new business opportunities.

  • Make business decisions quickly and effectively.

To achieve these benefits, teams need to develop the following skills. Doing so requires the investments described in the “Invest in Agility” chapter.

The team responds to business needs:

  • The team describes its plans and progress in terms of business metric outcomes jointly identified with management.

  • The team collaborates with internal and external stakeholders to determine when and how roadmaps will provide the best return on investment.

The team works as a trusted, autonomous team:

  • The team coordinates with management to understand and refine its role in achieving the organization’s overall business strategy.

  • Team members jointly take responsibility, and accept accountability, for achieving the business outcomes they identify.

  • Management gives the team the resources and authority it needs to autonomously achieve its business outcomes.

  • Management ensures that the team includes dedicated team members who have all the day-to-day skills the team needs to understand the market and achieve its business outcomes.

The team pursues product greatness:

  • The team engages with its customers and market to understand product needs and opportunities.

  • The team creates hypotheses about business opportunities and conducts experiments to test them.

  • The team plans and develops its work in a way that allows it to completely change plans, without waste, given less than a month’s notice.

Achieving Optimizing Fluency

The investments needed for Optimizing fluency challenge the preconceptions and established order of most companies. It requires giving up a lot of control and putting a lot of trust in the team. There’s oversight, but it can still be scary.

As a result, you’ll usually need to demonstrate success with Focusing and Delivering fluency for a few years before your company will give you the authority and autonomy needed for Optimizing fluency. Early stage startups tend to be an exception, but everyone else will have some trust-building to do.

By the time you’re ready for Optimizing, your team is likely to have mastered the rest of the practices in this book. You won’t need a how-to guide any more. You’ll have mastered the art.

So the chapters in this part are short and sweet. They’ll help you get started, and provide clues about what to try next. It’s up to you to take what you’ve learned about Agile development, combine it with these ideas, and create something great of your own.

These chapters will help you get started:

  • The “Autonomy” chapter discusses the nature of autonomous teams.

  • The “Discovery” chapter discusses ways your team can learn.

  • The “Into the Future” chapter wraps up with a look at what comes next.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: Blind Spot Discovery

Fluent Agile teams are very good at building quality into their code, but nobody’s perfect. Every team has blind spots. How do you discover your team’s unknown unknowns? Janet Gregory and Abby Bangser join us to explore this question.

Janet Gregory is co-author with Lisa Crispin on three books on Agile Testing. She consults with agile teams to help them integrate testing activities into their daily work. She is also co-founder of the Agile Testing Fellowship, offering holistic and agile testing courses around the world.

Abby Bangser is a Principal Engineer with a passion for bringing product and software engineering practices to internal infrastructure and tooling. Currently Abby is working with Syntasso to enable platform teams to build the secure, scalable and usable platforms. Outside of work, Abby co-hosts #CoffeeOps London meetup, is a SLOConf global captain, and hobbles around a tag rugby field as much as she can.

Reading:
📖 Blind Spot Discovery

🎙 Discussion prompts:

  • Blind spot discovery is ultimately about finding important gaps in your team’s understanding. What are some of your favorite techniques for finding blind spots?

  • How is blind spot discovery different from traditional testing activities? Who should be doing it?

  • As the book discusses, some of the most commonly overlooked blind spots involve how organizations decide what to build. How can teams discover these blind spots and influence changes?

  • The possibility of blind spots can, itself, be a blind spot. How would you suggest introducing blind spot discovery to an organization?

About the Book Club

From October 2021 to August 2022, I hosted a call-in talk show based on the second edition of The Art of Agile Development. The series used the book as a jumping-off point for wide-ranging discussions about Agile ideas and practices, and had a star-studded guest list.

For an archive of past sessions, visit the book club index. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: Incident Analysis

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Incident Analysis

Audience
Whole Team

We learn from failure.

Despite your best efforts, your software will sometimes fail to work as it should. Some failures will be minor, such as a typo on a web page. Others will be more significant, such as code that corrupts customer data, or an outage that prevents customer access.

Some failures are called bugs or defects; others are called incidents. The distinction isn’t particularly important. Either way, once the dust has settled and things are running smoothly again, you need to figure out what happened and how you can improve. This is incident analysis.

The details of how to respond during an incident are out of the scope of this book. For an excellent and practical guide to incident response, see Site Reliability Engineering: How Google Runs Production Systems [Beyer2016], particularly Chapters 12–14.

The Nature of Failure

Failure is a consequence of your entire development system.

It’s tempting to think of failure as a simple sequence of cause and effect—A did this, which led to B, which led to C—but that’s not what really happens.1 In reality, failure is a consequence of the entire development system in which work is performed. (Your development system is every aspect of how you build software, from tools to organizational structure. It’s in contrast to your software system, which is the thing you’re building.) Each failure, no matter how minor, is a clue about the nature and weaknesses of that development system.

1My discussion of the nature of failure is based on [Woods2010] and [Dekker2014].

Failure is the result of many interacting events. Small problems are constantly occurring, but the system has norms that keep them inside a safe boundary. A programmer makes an off-by-one error, but their pairing partner suggests a test to catch it. An on-site customer explains a story poorly, but notices the misunderstanding during customer review. A team member accidentally erases a file, but continuous integration rejects the commit.

When failure occurs, it’s not because of a single cause, but because multiple things go wrong at once. A programmer makes an off-by-one error, and their pairing partner was up late with a newborn and doesn’t notice, and the team is experimenting with less frequent pair swaps, and the canary server alerts were accidentally disabled. Failure happens, not because of problems, but because the development system—people, processes, and business environment—allows problems to combine.

Furthermore, systems exhibit a drift toward failure. Ironically, for teams with a track record of containing failures, the threat isn’t mistakes, but success. Over time, as no failures occur, the team’s norms change. For example, they might make pairing optional so people have more choice in their work styles. Their safe boundaries shrink. Eventually, the failure conditions—which existed all along!—combine in just the right way to exceed these smaller boundaries, and a failure occurs.

It’s hard to see the drift toward failure. Each change is small, and is an improvement in some other dimension, such as speed, cost, convenience, or customer satisfaction. To prevent drift, you have to stay vigilant. Past success doesn’t guarantee future success.

Small failures are a “dress rehearsal” for large failures.

You might expect large failures to be the result of large mistakes, but that isn’t how failure works. There’s no single cause, and no proportionality. Large failures are the result of the same systemic issues as small failures. That’s good news, because it means small failures are a “dress rehearsal” for large failures. You can learn just as much from them as you do from big ones.

Therefore, treat every failure as an opportunity to learn and improve. A typo is still a failure. A problem detected before release is still a failure. No matter how big or small, if your team thinks something is “done,” and it later needs correction, it’s worthy of analysis.

But it goes even deeper. Failures are a consequence of your development system, as I said, but so are successes. You can analyze them, too.

Conducting the Analysis

Ally
Retrospectives

Incident analysis is a type of retrospective. It’s a joint look back at your development system for the purpose of learning and improving. Textbook. As such, an effective analysis will involve the five stages of a retrospective: [Derby2006]

  1. Set the stage

  2. Gather data

  3. Generate insights

  4. Decide what to do

  5. Close the retrospective

Include your whole team in the analysis, along with anyone else involved in the incident response. Avoid including managers and other observers; you want participants to be able to speak up and admit mistakes openly, and that requires limiting attendance to just the people who need to be there. When there’s a lot of interest in the analysis, you can produce an incident report, as I’ll describe later.

The time needed for the analysis session depends on the number of events leading up to the incident. A complex outage could have dozens of events and take several hours. A simple defect, though, might have only a handful of events and could take 30–60 minutes. You’ll get faster with experience.

In the beginning, and for sensitive incidents, a neutral facilitator should lead the session. The more sensitive the incident, the more experienced the facilitator needs to be.

This practice, as with all practices in this book, is focused on the team level—incidents that your team can analyze mainly on its own. You can also use it to conduct an analysis of your team’s part in a larger incident.

1. Set the stage
Ally
Safety

Because incident analysis involves a critical look at successes and failures, it’s vital for every participant to feel safe to contribute, including having frank discussions about the choices they made. For that reason, start the session by reminding everyone that the goal is to use the incident to better understand the way you create software—the development system of people, processes, expectations, environment, and tools. You’re not here to focus on the failure itself or to place blame, but instead to learn how to make your development system more resilient.

Ask everyone to confirm that they can abide by that goal and assume good faith on the part of everyone involved in the incident. Norm Kerth’s Prime Directive is a good choice:

Regardless of what we discover, we must understand and truly believe that everyone did the best job he or she could, given what was known at the time, his or her skills and abilities, the resources available, and the situation at hand. [Kerth2001] (ch. 1)

In addition, consider establishing the Vegas rule: What’s said in the analysis session, stays in the analysis session. Don’t record the session, and ask participants to agree to not repeat any personal details shared in the session.

If the session includes people outside the team, or if your team is new to working together, you might also want to establish working agreements for the session. (See the “Create Working Agreements” section.)

2. Gather data

Once the stage has been set, your next step is to understand what happened. You’ll do so by creating an annotated, visual timeline of events.

Stay focused on facts, not interpretations.

People will be tempted to interpret the data at this stage, but it’s important to keep everyone focused on “just the facts.” They’ll probably need multiple reminders as the stage progresses. With the benefit of hindsight, it’s easy to fall into the trap of critiquing people’s actions, but that won’t help. A successful analysis focuses on understanding what people actually did, and how your development system contributed to them doing those things, not what they could have done differently.

To create the timeline, start by creating a long horizontal space on your virtual whiteboard. If you’re conducting the session in person, use blue tape on a large wall. Divide the timeline into columns representing different periods in time. The columns don’t need to be uniform; weeks or months are often best for the earlier part of the timeline, while hours or days might be more appropriate for the moments leading up to the incident.

Have participants use simultaneous brainstorming to think of events relevant to the incident. (See the “Work Simultaneously” section.) Events are factual, nonjudgmental statements about something that happened, such as “Deploy script stops all ServiceGamma instances,” “ServiceBeta returns 418 response code,” “ServiceAlpha doesn’t recognize 418 response code and crashes,” “On-call engineer is paged about system downtime,” and “On-call engineer manually restarts ServiceGamma instances.” (You can use people’s names, but only if they’re present and agree.) Be sure to capture events that went well, too, not just those that went poorly.

Software logs, incident response records, and version control history are all likely to be helpful sources of inspiration. Write each event on a separate sticky note and add it to the board. Use the same color sticky for each event.

Afterward, invite everyone to step back and look at the big picture. Which events are missing? Working simultaneously, look at each event and ask, “What came before this? What came after?” Add each additional event as another sticky note. You might find it helpful to show before/after relationships with arrows.

How was the automation used? Configured? Programmed?

Be sure to include events about people, not just software. People’s decisions are an enormous factor in your development system. Find each event that involves automation your team controls or uses, then add preceding events about how people contributed to that event. How was the automation used? Configured? Programmed? Be sure to keep these events neutral in tone and blame-free. Don’t second-guess what people should have done; only write what they actually did.

For example, the event “Deploy script stops all ServiceGamma instances” might be preceded by “Op misspells --target command-line parameter as --tagret” and “Engineer inadvertently changes deploy script to stop all instances when no --target parameter found,” which in turn is preceded by “Team decides to clean up deploy script’s command-line processing.”

Events can have multiple predecessors feeding into the same event. Each predecessor can occur at different points in the timeline. For example, the event “ServiceAlpha doesn’t recognize 418 response code and crashes” could have three predecessors: “ServiceBeta returns 418 response code“ (immediately before); “Engineer inadvertently disables ServiceAlpha top-level exception handler” (several months earlier); and “Engineer programs ServiceAlpha to throw exception when unexpected response code received” (a year earlier).

As events are added, encourage participants to share recollections of their opinions and emotions at the time. Don’t ask people to excuse their actions; you’re not here to assign blame. Ask them to explain what it was like to be there, in the moment, when the event occurred. This will help your team understand the social and organizational aspects of your development system—not just what choices were made, but why.

Ask participants to add additional stickies, in another color, for those thoughts. For example, if Jarrett says, “I had concerns about code quality, but I felt like I had to rush to meet our deadline,” he could write two sticky notes: “Jarrett has concerns about code quality” and “Jarrett feels he has to rush to meet deadline.” Don’t speculate about the thoughts of people who aren’t present, but you can record things they said at the time, such as “Layla says she has trouble remembering deploy script options.”

Keep these notes focused on what people felt and thought at the time. Your goal is to understand the system as it really was, not to second-guess people.

Finally, ask participants to highlight important events in the timeline—the ones that seem most relevant to the incident. Double-check whether people have captured all their recollections about those events.

3. Generate insights

Now it’s time to turn facts into insights. In this stage, you’ll mine your timeline for clues about your development system. Before you begin, give people some time to study the board. This can be a good point to call for a break.

The events aren’t the cause of failure; they’re a symptom of your system.

Begin by reminding attendees about the nature of failure. Problems are always occurring, but they don’t usually combine in a way that leads to failure. The events in your timeline aren’t the cause of the failure; they’re a symptom of how your development system functions. It’s that deeper system that you want to analyze.

Look at the events you identified as important during the “gather data” activity. Which of them involved people? To continue the example, you would choose the “Op misspells --target command-line parameter as --tagret” and “Engineer inadvertently changes deploy script to stop all instances when no --target parameter found” events, but not “Deploy script stops all ServiceGamma instances,” because that event happened automatically.

Working simultaneously, assign one or more of the following categories2 to each people-involved event. Write each category on a third color of sticky note and add it to the timeline.

2The event categories were inspired by [Woods2010] and [Dekker2014].

  • Knowledge and mental models: Involves information and decisions within the team involved in the event. For example, believing a service maintained by the team will never return a 418 response.

  • Communication and feedback: Involves information and decisions from outside the team involved in the event. For example, believing a third-party service will never return a 418 response.

  • Attention: Involves the ability to focus on relevant information. For example, ignoring an alert because several other alerts are happening at the same time, or misunderstanding the importance of an alert due to fatigue.

  • Fixation and plan continuation: Persisting with an assessment of the situation in the face of new information. For example, during an outage, continuing to troubleshoot a failing router after logs show that traffic successfully transitioned over to the backup router. Also involves continuing with an established plan; for example, releasing on the planned date despite beta testers saying the software isn’t ready.

  • Conflicting goals: Choosing between multiple goals, some of which may be unstated. For example, deciding to prioritize meeting a deadline over improving code quality.

  • Procedural adaptation: Involves situations in which established procedure doesn’t fit the situation. For example, abandoning a checklist after one of the steps reports an error. A special case is the responsibility-authority double bind, which requires people to make a choice between being punished for violating procedure or following a procedure that doesn’t fit the situation.

  • User experience: Involves interactions with computer interfaces. For example, providing the wrong command-line argument to a program.

  • Write-in: You can create your own category if the event doesn’t fit into the ones I’ve provided.

The categories apply to positive events, too. For example, “Engineer programs backend to provide safe default when ServiceOmega times out” is a “knowledge and mental models” event.

After you’ve categorized the events, take a moment to consider the whole picture again, then break into small groups to discuss each event. What does each one say about your development system? Focus on the system, not the people.

For example, the event, “Engineer inadvertently changes deploy script to stop all instances when no --target parameter found,” sounds like it’s a mistake on the part of the engineer. But the timeline reveals that Jarrett, the engineer in question, felt he had to rush to meet a deadline, even though it reduced code quality. That means it was a “conflicting goals” event, and it’s really about how priorities are decided and communicated. As team members discuss the event, they realize they all feel pressure from sales and marketing to prioritize deadlines over code quality.

Incident analysis always looks at the system, not individuals.

On the other hand, let’s say the timeline analysis revealed Jarrett also misunderstood the behavior of the team’s command-line processing library. That would make it a “knowledge and mental models” event, too, but you still wouldn’t put the blame on Jarrett. Incident analysis always looks at the system, not individuals. Individuals are expected to make mistakes. In this case, a closer look at the event reveals that, although the team used test-driven development and pairing for production code, it didn’t apply that standard to its scripts. The team didn’t have any way to prevent mistakes in its scripts, and it was just a matter of time before one slipped through.

After the breakout groups have had a chance to discuss the events—for speed, you might want to divide the events among the groups, rather than having each group discuss every event—come together to discuss what you’ve learned about the system. Write each conclusion on a fourth color of sticky note and put it on the timeline next to the corresponding event. Don’t make suggestions, yet; just focus on what you’ve learned. For example, “No systematic way to prevent programming mistakes in scripts,” “Engineers feel pressured to sacrifice code quality,” and “Deploy script requires long and error-prone command line.”

4. Decide what to do

You’re ready to decide how to improve your development system. You’ll do so by brainstorming ideas, then choosing a few of your best options.

Start by reviewing the overall timeline again. How could you change your system to be more resilient? Consider all possibilities, without worrying about feasibility. Brainstorm simultaneously onto a table or a new area of your virtual whiteboard. You don’t need to match your ideas to specific events or questions. Some will address multiple things at once. Questions to consider include:3

3Thanks to Sarah Horan Van Treese for suggesting most of these questions.

  • How could we prevent this type of failure?

  • How could we detect this type of failure earlier?

  • How could we fail faster?

  • How could we reduce the impact?

  • How could we respond faster?

  • Where did our safety net fail us?

  • What related flaws should we investigate?

To continue the example, your team might brainstorm ideas such as, “stop committing to deadlines,” “update forecast weekly and remove stories that don’t fit deadline,” “apply production coding standards to scripts,” “perform review of existing scripts for additional coding errors,” “simplify deploy script’s command line,” and “perform UX review of command-line options across all of the team’s scripts.” Some of these ideas are better than others, but at this stage, you’re generating ideas, not filtering them.

Once you have a set of options, group them into “control,” “influence,” and “soup” circles, depending on your team’s ability to make them happen, as described in the “Circles and Soup” section. Have a brief discussion about the options’ pros and cons. Then use dot voting, followed by a consent vote (see the “Work Simultaneously” section and the “Seek Consent” section), to decide which options your team will pursue. You can choose more than one.

As you think about what to choose, remember that you shouldn’t fix everything. Sometimes, introducing a change adds more risk or cost than the thing it solves. In addition, although every event is a clue about the behavior of your development system, not every event is bad. For example, one of the example events was, “Engineer programs ServiceAlpha to throw exception when unexpected response code received.” Even though that event directly led to the outage, it made diagnosing the failure faster and easier. Without it, something still would have gone wrong, and it would have taken longer to solve.

5. Close the retrospective

Incident analysis can be intense. Close the retrospective by giving people a chance to take a breath and gently shift back to their regular work. That breath can be metaphorical, or you can literally suggest that people stand up and take a deep breath.

Start by deciding what to keep. A screen shot or photo of the annotated timeline and other artifacts is likely to be useful for future reference. First, invite participants to review the timeline for anything they don’t want shared outside the session. Remove those stickies before taking the picture.

Next, decide who will follow through on your decisions and how. If your team will be producing a report, decide who will participate in writing it.

Finally, wrap up by expressing appreciations to one another for your hard work.4 Explain the exercise and provide an example: “(Name), I appreciate you for (reason).” Sit down and wait. Others will speak up as well. There’s no requirement to speak, but leave plenty of time at the end—a minute or so of silence—because people can take a little while to speak up.

4The “appreciations” activity is based on [Derby2006] (ch. 8).

Some people find the “appreciations” activity uncomfortable. An alternative activity is for each participant to take turns saying a few words about how they feel now the analysis is over. It’s okay to pass.

Afterward, thank everybody for their participation. Remind them of the Vegas rule (don’t share personal details without permission), and end.

Organizational Learning

Organizations will often require a report about the incident analysis’s conclusions. It’s usually called a postmortem, although I prefer the more neutral incident report.

In theory, part of the purpose of the incident report is to allow other teams to use what you’ve learned to improve their own development systems. Unfortunately, people tend to dismiss lessons learned by other teams. This is called distancing through differencing. [Woods2010] (ch. 14) “Those ideas don’t apply to us, because we’re an internally facing team, not externall facing.” Or, “We have microservices, not a monolith.” Or, “We work remotely, not in person.” It’s easy to latch on to superficial differences as a reason to avoid change.

Preventing this distancing is a matter of organizational culture, which puts it out of the scope of this book. Briefly, though, people have the most appetite for learning and change after a major failure. Other than that, I’ve had the most success from making the lessons personal. Show how the lessons affect things your audience cares about.

This is easier in conversation than with a written document. In practice, I suspect—but don’t know for sure!—that the most effective way to get people to read and apply the lessons from an incident report is to tell a compelling, but concise story. Make the stakes clear from the outset. Describe what happened and allow the mystery to unfold. Describe what you learned about your system and explain how it affects other teams, too. Describe the potential stakes for other teams and summarize what they can do to protect themselves.

Incident Accountability

Another reason organizations want incident reports is to “hold people accountable.” This tends to be misguided at best.

That’s not to say teams shouldn’t be accountable for their work. They should be! And by performing an incident analysis and working on improving their development system, including working with the broader organization to make changes, they are showing accountability.

Searching for someone to blame makes big incidents worse.

Searching for a “single, wringable neck,” in the misguided parlance of Scrum, just encourages deflection and finger-pointing. It may lower the number of reported incidents, but that’s just because people hide problems. The big ones get worse.

“As the incident rate decreases, the fatality rate increases,” reports The Field Guide to Understanding ‘Human Error’, speaking about construction and aviation. “[T]his supports the importance...of learning from near misses. Suppressing such learning opportunities, at whatever level, and by whatever means, is not just a bad idea. It is dangerous.” [Dekker2014] (ch. 7)

If your organization understands this dynamic, and genuinely wants the team to show how it’s being accountable, you can share what the incident analysis revealed about your development system. (In other words, the final stickies from the “Generate Insights” activity.) You can also share what you decided to do to improve the resiliency of your development system.

Often, your organization will have an existing report template that you’ll have to conform to. Do your best to avoid presenting a simplistic cause-and-effect view of the situation, and be careful to show how the system, not individuals, allowed problems to turn into failures.

Questions

What if we don’t have time to do a full analysis of every bug and incident?

Incident analysis doesn’t have to be a formal retrospective. You can use the basic structure to explore possibilities informally, with just a few people, or even in the privacy of your own thoughts, in just a few minutes. The core point to remember is that events are symptoms of your underlying development system. They’re clues to teach you how your system works. Start with the facts, discuss how they change your understanding of your development system, and only then think of what to change.

Prerequisites

Ally
Safety

Successful incident analysis depends on psychological safety. Unless participants feel safe to share their perspective on what happened, warts and all, you’ll have trouble achieving a deep understanding of your development system.

The broader organization’s approach to incidents has a large impact on participants’ safety. Even companies that pay lip-service to “blameless postmortems” have trouble moving from a simplistic cause-effect view of the world to a systemic view. They tend to think of “blameless” as “not saying who’s to blame,” but to be truly blameless, they need to understand that no one is to blame. Failures and successes are a consequence of a complex system, not specific individuals’ actions.

You can conduct a successful incident analysis in organizations that don’t understand this, but you’ll need to be extra careful to establish ground rules about psychological safety, and ensure people who have a blame-oriented worldview don’t attend. You’ll also need to exercise care to make sure the incident report, if there is one, is written with a systemic view, not a cause-effect view.

Indicators

When you conduct incident analyses well:

  • Incidents are acknowledged and even incidents with no visible impact are analyzed.

  • Team members see the analysis as an opportunity to learn and improve, and even look forward to it.

  • Your system’s resiliency improves over time, resulting in fewer escaped defects and production outages.

  • No one is blamed, judged, or punished for the incident.

Alternatives and Experiments

Many organizations approach incident analysis through the lens of a standard report template. This tends to result in shallow “quick fixes” rather than a systemic view, because people focus on what they want to report rather than studying the whole incident. The format I’ve described will help people expand their perspective before coming to conclusions. Conducting it as a retrospective will also ensure everybody’s voices are heard, and the whole team buys into the conclusions.

Many of the ideas in this practice are inspired by books from the field of Human Factors and Systems Safety. Those books are concerned with life-and-death decisions, often made under intense time pressure, in fields such as aviation. Software development has different constraints, and some of those transplanted ideas may not apply perfectly.

In particular, the event categories I’ve provided are likely to have room for improvement. I suspect there’s room to split the “knowledge and mental models” category into several categories. Don’t just add categories arbitrarily, though. Check out the further reading section and ground your ideas in the underlying theory first.

The retrospective format I’ve provided has the most room for experimentation. It’s easy to fixate on solutions or simplistic cause-effect thinking during an incident analysis, and the format I’ve provided is designed to avoid this mistake. But it’s just a retrospective. It can be changed. After you’ve conducted several analyses using the format I’ve provided, see what you can improve by experimenting with new activities. For example, can you conduct parts of the “Gather Information” stage asynchronously? Are there better ways to analyze the timeline during the “Generate Insights” stage? Can you provide more structure to “Decide What to Do”?

Finally, incident analysis isn’t limited to analyzing incidents. You can also analyze successes. As long as you’re learning about your development system, you’ll achieve the same benefits. Try conducting an analysis of a time when the team succeeded under pressure. Find the events that could have led to failure, and the events that prevented failure from occurring. Discover what that teaches you about your system’s resiliency, and think about how you can amplify that sort of resiliency in the future.

Further Reading

The Field Guide to Understanding ‘Human Error’ [Dekker2014] is a surprisingly easy read that does a great job of introducing the theory underlying much of this practice.

Behind Human Error [Woods2010] is a much denser read, but it covers more ground than The Field Guide. If you’re looking for more detail, this is your next step.

The previous two books are based on Human Factors and Systems Safety research. The website learningfromincidents.io is dedicated to bringing those ideas to software development. At the time of this writing, it’s fairly thin, but its heart is in the right place. I’m including it in the hopes that it will have more material by the time you read this.

Share your thoughts about this excerpt on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: No Bugs

For many people, “quality” means “testing,” but Agile teams treat quality differently. Quality isn’t something you test for; it’s something you build in. Not just into your code, but into your entire development system: the way your team approaches its work, the way people think about mistakes, and even the way your organization interacts with your team. In this session, Arlo Belshee and Llewellyn Falco join us to discuss how to build quality in.

Arlo Belshee is a 20-year legacy code & DevOps veteran with a passion for zero bugs. A firm believer in mending code, Arlo's current work is his company, Dig Deep Roots, where he teaches technical practices that unwind legacy code safely a codebase at a time.

Llewellyn Falco is an Agile technical coach. He’s known for strong-style pairing, the open source “ApprovalTests” testing tool, and co-authoring the Mob Programming Guidebook. Llewellyn spends most of his time programming in C# and Java and specializes in improving legacy code.

Reading:
📖 Quality (introduction)
📖 No Bugs

🎙 Discussion prompts:

  • The book conveniently describes four ways to prevent errors and build quality in. First is programmer errors. What are your go-to techniques for preventing coding mistakes?

  • Next is design errors. How do you stop your design and architecture from becoming a source of errors?

  • Requirement errors are also common. What do you do to ensure the team builds the right thing?

  • Finally, systemic errors. How do you find your team’s blind spots and prevent them from recurring?

About the Book Club

From October 2021 to August 2022, I hosted a call-in talk show based on the second edition of The Art of Agile Development. The series used the book as a jumping-off point for wide-ranging discussions about Agile ideas and practices, and had a star-studded guest list.

For an archive of past sessions, visit the book club index. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: Team Dynamics

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Team Dynamics

Audience
Whole Team

by Diana Larsen

We steadily improve our ability to work together.

Your team’s ability to work together forms the bedrock of its ability to develop and deliver software. You need collaboration skills, the ability to share leadership roles, and an understanding of how teams evolve over time. Together, these skills determine your team dynamics.

Team dynamics are the invisible undercurrents that determine your team’s culture. They’re the way people interact and cooperate. Healthy team dynamics lead to a culture of achievement and well-being. Unhealthy team dynamics lead to a culture of disappointment and dysfunction.

Anyone on the team can have a role in influencing these dynamics. Use the ideas in this practice to suggest ways to improve team members' capability to work together.

...to continue reading, buy the book!

In this Section

  1. Team Dynamics
    1. What Makes a Team?
    2. Team Development
      1. Forming: The new kid in class
      2. Storming: Group adolescence
      3. Norming: We’re #1
      4. Performing: Team synergy
      5. Adjourning: Separating and moving on
    3. Communication, Collaboration, and Interaction
      1. Start with a strong base of trust
      2. Support your growing trust with three-fold commitment
      3. Right-size conflicts with feedback
      4. Spark creativity and innovation
      5. Sustain high performance
    4. Shared Leadership
    5. Toxic Behavior
    6. Questions
    7. Prerequisites
    8. Indicators
    9. Alternatives and Experiments
    10. Further Reading

Discuss the book on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.

Agile Book Club: Retrospectives

Feedback and adaptation are central to Agile, and that applies to the team’s approach to Agile itself. Although you might start with an off-the-shelf Agile method, every team is expected to customize its method for itself. In this session, Aino Corry joins us to look at how retrospectives help teams reflect and improve.

Aino Corry is a teacher, a technical conference editor and retrospectives facilitator. She holds a masters degree and a Ph.D. in computer science. She has 12 years of experience with Patterns in Software Development, and 20 years of experience with agile processes in academia and industry. She also teaches how to teach Computer Science to teachers, and thus lives up to the name of her company; Metadeveloper. In her spare time, she runs and sings (but not at the same time). Aino is the author of the book Retrospective Antipatterns.

Reading:
📖 Improvement (introduction)
📖 Retrospectives
📖 Impediment Removal

🎙 Discussion prompts:

  • Retrospectives are a powerful tool, but only if used correctly. What are some ways you’ve seen them go wrong?

  • Retrospectives can get boring after a while. What can be done to keep them interesting?

  • Teams often struggle with following through on retrospective ideas. How can teams do a better job at closing the retrospective feedback loop?

  • Sometimes, the team’s biggest impediments are out of their direct control. What should teams do about impediments that are outside their direct control?

About the Book Club

From October 2021 to August 2022, I hosted a call-in talk show based on the second edition of The Art of Agile Development. The series used the book as a jumping-off point for wide-ranging discussions about Agile ideas and practices, and had a star-studded guest list.

For an archive of past sessions, visit the book club index. For more about the book, visit the Art of Agile Development home page.

AoAD2 Practice: Blind Spot Discovery

This is an excerpt from The Art of Agile Development, Second Edition. Visit the Second Edition home page for additional excerpts and more!

This excerpt is copyright 2007, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Blind Spot Discovery

Audience
Testers, Whole Team

We discover the gaps in our thinking.

Fluent Delivering teams are very good at building quality into their code, as you saw in the previous practice. But nobody’s perfect, and teams have blind spots. Blind spot discovery is a way of finding those gaps.

To find blind spots, look at the assumptions your team makes, and consider the pressures and constraints team members are under. Imagine what risks the team might be facing and what team members might falsely believe to be true. Make a hypothesis about the blind spots that could occur as a result and investigate to see if your guess is right. Testers tend to be particularly good at this.

When you find a blind spot, don’t just fix the problem you found. Fix the gap. Think about how your approach to development allowed the bug to occur, then change your approach to prevent that category of bugs from happening again, as described in the “Prevent Systemic Errors” section.

...to continue reading, buy the book!

In this Section

  1. Blind Spot Discovery
    1. Validated Learning
    2. Exploratory Testing
    3. Chaos Engineering
    4. Penetration Testing and Vulnerability Assessments
    5. Questions
    6. Prerequisites
    7. Indicators
    8. Alternatives and Experiments

Discuss the book on the AoAD2 mailing list or Discord server. For videos and interviews regarding the book, see the book club archive.

For more excerpts from the book, see the Second Edition home page.