The Art of Agile Development: Root-Cause Analysis

The second edition is now available! The Art of Agile Development has been completely revised and updated with all new material. Visit the Second Edition page for more information, or buy it on Amazon.

in 99 words

When mistakes occur, blame your process, not people. Root-cause analysis helps. What allowed the mistake to happen? What will prevent them in the future? Assume people will continue to make mistakes and build fault-tolerance into your improvements.

One approach: ask "why" five times. Use it for every problem you encounter, from the trivial to the significant. You can apply some solutions yourself. Some will require team discussion, and others need coordination with the larger organization.

When mistakes become rare, avoid over-applying root-cause analysis. Balance the risk of error against the cost of more process overhead.

as haiku

a slug eats dessert...
making lattice from lettuce,
she thins the surplus

Commentary

In the Privacy of Your Own Thoughts

Full Text

The following text is excerpted from The Art of Agile Development by James Shore and Shane Warden, published by O'Reilly. Copyright © 2008 the authors. All rights reserved.

Root-Cause Analysis

Audience
Whole Team

We prevent mistakes by fixing our process.

When I hear about a serious mistake on my project, my natural reaction is to get angry or frustrated. I want to blame someone for screwing up.

Unfortunately, this response ignores the reality of Murphy's Law. If something can go wrong, it will. People are, well, people. Everybody makes mistakes. I certainly do. Aggressively laying blame might cause people to hide their mistakes, or to try to pin them on others, but this dysfunctional behavior won't actually prevent mistakes.

Instead of getting angry, I try to remember Norm Kerth's Prime Directive: everybody is doing the best job they can given their abilities and knowledge (see Retrospectives later in this chapter for the full text of the Prime Directive). Rather than blaming people, I blame the process. What is it about the way we work that allowed this mistake to happen? How can we change the way we work so that it's harder for something to go wrong?

This is root-cause analysis.

How to Find the Root Cause

A classic approach to root-cause analysis to ask "why" five times. Here's a real-world example.

Problem: When we start working on a new task, we spend a lot of time getting the code into a working state.

Why? Because the build is often broken in source control.

Why? Because people check in code without running their tests.

It's easy to stop here and say, "Aha! We found the problem. People need to run their tests before checking in." That is a correct answer, as running tests before check-in is part of continuous integration. But it's also already part of the process. People know know they should run the tests, but they aren't doing it. Dig deeper.

Why don't they run tests before checking in? Because sometimes the tests take longer to run than people have available.

Why do the tests take so long? Because they spend a lot of time in database setup and teardown.

Why? Because our design makes it difficult to test business logic without touching the database.

Asking "why" five times revealed a much more interesting answer than "people aren't running tests." It helped to move away from blaming team members and toward an underlying, fixable problem. The solution is clear, if not easy: the design needs improvement.

How to Fix the Root Cause

Root cause analysis is a technique that you can use for every problem you encounter, from the trivial to the significant. You can ask yourself "why" at any time. You can even fix some problems just by improving your own work habits.

Ally
Retrospectives

More often, however, fixing root causes requires other people to cooperate. If your team has control over the root cause, gather the team members, share your thoughts, and ask for their help in solving the problem. A retrospective might be a good time for this.

If the root cause is outside the team's control entirely, then solving the problem may be difficult or impossible. For example, if your problem is "not enough pairing" and you identify the root cause as "we need more comfortable desks," your team may need the help of Facilities to fix it.

In this case, solving the problem is a matter of coordinating with the larger organization. Your project manager should be able to help. In the meantime, consider alternate solutions that are within your control.

When Not to Fix the Root Cause

When you first start applying root-cause analysis, you'll find many more problems than you can address simultaneously. Work on a few at a time. I like to chip away at the biggest problem while simultaneously picking off low-hanging fruit.

Over time, work will go more smoothly. Mistakes will become less severe and less frequent. Eventually—it can take months or years—mistakes will be notably rare.

A mistake-proof process is neither achievable nor desirable.

At this point, you may face the temptation to over-apply root-cause analysis. Beware of thinking that you can prevent all possible mistakes. Fixing a root cause may add overhead to the process. Before changing the process, ask yourself whether the problem is common enough to warrant the overhead.

Questions

Who should participate in root-cause analysis?

I usually conduct root-cause analysis in the privacy of my own thoughts, then share my conclusions and reasoning with others. Involve whoever is necessary to fix the root-cause.

When should we conduct root-cause analysis?

You can use root-cause any time you notice a problem—when you notice a mistake, as you're navigating, and in retrospectives. It need only take a few seconds. Keep your brain turned on and use root-cause analysis all of the time.

We know what our problems are. Why do we need to bother with root-cause analysis?

If you already understand the underlying causes of your problems, and you're making progress on fixing them, then you have already conducted root-cause analysis. However, it's easy to get stuck on a particular solution. Asking "why" five times may give you new insight.

How do we avoid blaming individuals?

If your root cause points to an individual, ask "why" again. Why did that person do what she did? How was it possible for her to make that mistake? Keep digging until you learn how to prevent that mistake in the future.

Keep in mind that lectures and punitive approaches are usually ineffective. It's better to make it difficult for people to make mistakes than to expect them always to do the right thing.

Results

When root-cause analysis is an instinctive reaction, your team values fixing problems rather than placing blame. Your first reaction to a problem is to ask how it could have possibly happened. Rather than feeling threatened by problems and trying to hide them, you raise them publicly and work to solve them.

Contraindications

The primary danger of root-cause analysis is that, ultimately, every problem has a cause outside of your control.

Don't use this as an excuse not to take action. If a root cause is beyond your control, work with someone (such as your project manager) who has experience coordinating with other groups. In the meantime, solve the intermediate problems. Focus on what is in your control.

Although few organizations actively discourage root-cause analysis, you may find that it is socially unacceptable. If your efforts are "disruptive" or a "waste of time," you may be better off avoiding root-cause analysis.

Alternatives

You can always perform root-cause analysis in the privacy of your thoughts. You'll probably find that a lot of causes are beyond your control. Try to channel your frustration and energy to fixing processes that you can influence.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.