AoAD2 Chapter: Development (introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Development

It’s startling how rarely software development processes actually talk about the nuts and bolts of development. The way your team develops matters. It’s what your team spends the most hours doing.

This chapter includes practices to speed up your development and make it more reliable:

  • “Zero Friction” on p.XX removes the delays that slow down development.

  • “Test-Driven Development” on p.XX ensures your software does exactly what programmers intend it to do.

  • “Refactoring” on p.XX allows programmers to continuously improve the design of their code.

  • “Spike Solutions” on p.XX enable programmers to learn through small, isolated experiments.

It introduces two key ideas:

  • “Key Idea: Optimize for Maintenance” on p.XX: Maintenance costs are more important than the costs of writing new code.

  • “Key Idea: Fast Feedback” on p.XX: The more quickly you can get feedback, the more quickly you can adjust course and correct mistakes.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Spike Solutions

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Spike Solutions

Audience
Programmers

We perform small, isolated experiments when we need more information.

You’ve probably noticed by now that Agile teams value concrete data over speculation. Whenever you’re faced with a question, don’t speculate about the answer—conduct an experiment! Figure out how you can use real data to make progress.

That’s what spike solutions are for, too. A spike solution, or spike, is a technical investigation. It’s a small experiment, in code, to research the answer to a problem. When you have the answer, the spike is discarded, or checked in as documentation.

Spike solutions use code because nothing is more concrete. You can read as many books, tutorials, or online answers as you like, but to truly understand a solution, write working code. It’s important to work from a practical point of view, not just a theoretical one. The best way to do so depends on what you want to learn.

Quick Questions

For questions about your language, libraries, or tools, write a line or two of code. If your programming language has a REPL (an interactive programming prompt), that’s often the quickest way to get your answer. For example, if you wanted to know if JavaScript could use comparison operators on strings, you could open a web browser console:

> "a" < "b"
true
> "b" < "a"
false
> "a" === "a"
true

Alternatively, you can write a short test. You can put it right next to your real tests, then delete it afterwards. For example, if you wanted to know if Java throws an exception on arithmetic overflow, a throwaway test would answer the question:

@Test
public void deleteme() {
  int a = Integer.MAX_VALUE + 1;  // test will fail if exception thrown
  System.out.println("No exception: a = " + a);
}

// Result of test run: "No exception: a = -2147483648"

Third-Party Dependencies

To learn how to use a third-party dependency, such as a library, framework, or service, create a small, standalone program that demonstrates how the dependency works. Don’t bother writing production-grade code—just focus on demonstrating the core idea. Run from the command line, hardcode values, and ignore user input, unless absolutely necessary. Provide just barely enough design and abstraction to keep yourself from getting lost.

For complex dependencies, such as frameworks, I’ll often start with their tutorial. However, those tutorials tend to emphasize getting up and running quickly, not helping you understand the framework. Then often have a lot of magic tooling that makes the framework harder to understand, not easier. So once you get the tutorial working, make it your own. Strip out the magic and call APIs manually. Simplify unneeded complexity. Think about your use cases and demonstrate them in the spike.

When you’re done, you can check the spike into your code repository to act as a reference while you build the real implementation. (I use a /spikes directory.) Once you’ve built out the production implementation, you can either delete the spike or keep it for future reference, depending on how useful it is.

Design Experiments

If you have an idea for a design improvement, but you’re not sure how it will work out, you can spike the design. This is one of my most common types of spikes when working in mature codebases. I’ll have an idea for a big design change, but I’m not sure if it will be worth it.

To spike a design, create a temporary, throwaway branch in your repository. In that temporary branch, you can experiment without having to worry about safe refactorings or passing tests. You don’t even need the code to work properly. The purpose of the spike is just to experiment with your design idea and see how it works in practice.

If your design idea doesn’t work out, delete the branch. If it does work out, you’ll still delete the branch, but you can keep it around while you refactor so you can refer back to it as needed.

Making Time for Spikes

Small, “quick question” spikes are usually performed on the spur of the moment. You see a need to clarify a small technical issue, you write and delete a quick spike, you move on.

Allies
Stories
Task Planning
Slack

Dependency and design spikes can happen in several ways. Sometimes, they’re planned intentionally, either with a spike story or a task. At other times, you won’t realize a story needs a spike until you’re in the middle of working on it. When that happens, you can either add a task to your planning board, or just work on the spike as part of your current task. Either way, your slack absorbs the cost.

Questions

Should we pair or mob on spikes?

It’s up to you. Because spikes aren’t production code, even teams with strict pair programming rules don’t require writing spikes in pairs.

One very effective way to pair on a spike is to have one person research the technology while another codes. That’s typically how mob programmers approach spikes. Another option is for people to work independently on separate approaches, each doing their own research and coding, then coming together to review progress and share ideas.

Should we really throw away our spikes?

Unless you think someone will refer to it later, toss it. Remember, the purpose of a spike solution is to give you the information and experience needed to solve a problem, not to produce the code that solves it. The real production code usually ends up being a better reference than the spike.

When should we create a spike?

Whenever it helps. Perform a spike whenever the constraints of writing production-grade code get in the way of figuring out a solution.

What if the spike reveals that the problem is more difficult than we thought?

That’s good; now you have information you needed to know. Perhaps your on-site customers will reconsider the value of the story you’re working on, or perhaps you need to think of another way to accomplish your goal.

Prerequisites

Avoid the temptation to create useful or generic programs out of your spikes. Focus your work on answering a specific technical question, and stop working on the spike as soon as it answers that question. Similarly, there’s no need to create a spike when you already understand a technology well.

Ally
Test-Driven Development

Don’t use spikes as an excuse to avoid disciplined test-driven development and refactoring. Never copy spike code into production code. Even if it’s exactly what you need, rewrite it using test-driven development so that it meets your production code standards.

Indicators

When you clarify technical questions with well-directed, isolated experiments:

  • Rather than speculating about how your program will work, you conduct an experiment that tells you.

  • The complexities of your production code doesn’t interfere with your experiments.

Alternatives and Experiments

Spike solutions are a learning technique based on performing small, concrete experiments. Some people perform these experiments in their production code, which increases the scope of possible error. If something doesn’t work as expected, is it because your understanding of the technology is wrong? Or is it due to an unseen interaction with the production code? Standalone spikes eliminate this uncertainty.

An alternative to spike solutions is to research problems by performing web searches, reading theory, and finding code snippets online. This can be good enough for small problems, but for bigger problems, the best way to really understand how the technology works is to get your hands dirty with some code. Go ahead and start with code you find online, if you need to, but then simplify and adapt the example. Why does it work? What happens when you change default parameters? Use the spike to clarify your understanding.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Refactoring

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Refactoring

Audience
Programmers

We revise and improve the design of existing code.

Code rots. That’s what everybody says: entropy is inevitable, and chaos eventually turns your beautifully imagined, well-designed code into a big mess of spaghetti.

I used to think that, too, before I learned to refactor. Now I have a ten-year-old production codebase that’s better today than it was when I first created it. I’d hate to go back: every year, it’s so much better than it was the year before.

Refactoring isn’t rewriting.

Refactoring makes this possible. It’s the process of changing the design of your code without changing its behavior. What it does stays the same, but how it does it changes. Despite popular misuse of the term, refactoring isn’t rewriting. Nor is it any arbitrary change. Refactoring is a careful, step-by-step approach to incrementally improving the design of your code.

Refactorings are also reversible: there’s no one right answer, so sometimes you’ll refactor in one direction, and sometimes you’ll refactor in the other. Just as you can change the expression “x²-1” to “(x+1)(x-1)” and back, you can change the design of your code—and once you can do that, you can keep entropy at bay.

How to Refactor

Allies
Test-Driven Development
Reflective Design
Slack

Technically, you can refactor at any time, but unless your IDE has guaranteed-safe refactorings, it’s best to do it when you have a good suite of tests that are all passing. Typically, you’ll refactor in three ways: first, during the “Refactor” step in the test-driven development loop; second, when using reflective design while implementing a task; and third, when using slack and reflective design to improve code quality.

When you refactor, you’ll proceed in a series of very small transformations. (Confusingly, each transformation is also called a refactoring.) Each refactoring is like making a turn on a Rubik’s cube. To achieve anything significant, you have to string together several individual refactorings, just as you have to string together several turns to solve the cube.

The fact that refactoring is a sequence of small transformations is sometimes lost on people new to refactoring. You don’t just change the design of your code: to refactor well, you need to make that change in a series of controlled steps. Each step should only take a few moments, and your tests should pass after each one.

There are a wide variety of individual refactorings. The definitive guide is Martin Fowler’s eponymous book, Refactoring: Improving the Design of Existing Code. [Fowler 2018] It contains an in-depth catalog of refactorings, and is well worth studying. I learned more about good code and design from reading that book than from any other source.

That said, you don’t need to memorize all the individual refactorings. Instead, try to learn the mindset behind them. The automated refactorings in your IDE will help you get started, but there’s many more options available to you. The trick is to break down the change you want to make into small steps that only change the design of your code, not its behavior.

Refactoring in Action

Ally
Slack

To illustrate this point, I’ll continue the example started in “A TDD Example” on p.XX. This is a small example, for space reasons, but it still illustrates how a bigger change can be broken down into individual refactorings. Each refactoring is just a matter of seconds.

To follow along with this example, clone the git repository at https://github.com/jamesshore/livestream, check out the 2020-05-05-end tag, and modify the src/rot-13.js file. See README.md for instructions about how to run the build.)

At the end of the TDD example, we had a JavaScript module that performed ROT-13 encoding:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

function codeFor(letter) {
  return letter.charCodeAt(0);
}

The code worked, and was decent quality, but it was overly verbose. It used character codes for determining ranges, but JavaScript allows you to compare letters directly. We can simplify the code by removing codeFor() and having isBetween() do a direct comparison, like this:

function isBetween(letter, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

Although that change could be made all at once, making big changes in a real-world application will introduce bugs and can get you into a state that’s hard to get out of. (Been there, done that. In a public demonstration of refactoring. Youch.) As with TDD, the better you understand how to refactor, the smaller steps you’re able to make, and the faster you go. So I’ll demonstrate the refactoring step by safe step.

To start with, isBetween() takes charCode, not letter. I needed to modify its caller, transformLetter(), to pass in a letter. But transformLetter() didn’t have a letter either. Even transform() didn’t have a letter. So that was the first thing to introduce:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) ...

This was a do-nothing statement: I introduced a variable, but nothing used it, so I expected the tests to pass. I ran them, and they did.

Although the letter variable wasn’t used, introducing it gave me the ability to pass letter into transformLetter. That was my next step.

Ally
Zero Friction

Notice how small these steps were. From experience, I knew that manually refactoring function signatures often goes wrong, so I wanted to take it slow. Such small steps require a zero-friction build, which I had.

exports.transform = function(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

The tests passed again. Now that I had letter in transformLetter(), I could pass it through to isBetween():

function transformLetter(letter, charCode) {
  if (isBetween(letter, charCode, "a", "m") ||
      isBetween(letter, charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(letter, charCode, "n", "z") ||
             isBetween(letter, charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

(Tests passed.) And now that isBetween() had letter, I could finally modify isBetween to use it:

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

(Tests passed.) The codeFor() method was no longer in use, so I deleted it.

Ally
Slack

(Tests passed.) I had accomplished what I originally set out to do, but now that I saw what the code looked like, I could see more opportunities to simplify. This is common when refactoring: cleaning up the code will make more cleanups visible. Deciding whether to pursue those additional cleanups is a question of judgment and how much slack you have.

This is what the code looked like:

exports.transform = function(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  if (isBetween(letter, charCode, "a", "m") ||
      isBetween(letter, charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(letter, charCode, "n", "z") ||
             isBetween(letter, charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

In this case, I had plenty of slack, so I decided to keep refactoring. The isBetween() function didn’t seem like it was adding any value, so I inlined it. I was able to do this in a single, bigger step because I used my editor’s automatic “Inline Function” refactoring.

function transformLetter(letter, charCode) {
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M")  {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) Passing in charCode seemed redundant, so I copied the charCode logic from transform into transformLetter():

function transformLetter(letter, charCode) {
  charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) And then removed the unneeded charCode parameter:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) That was a nice simplification, but I saw an opportunity to make it even better. Rather than manually looping over the string, I realized I could use a regular expression to call transformLetter() instead:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  return input.replace(/[A-Za-z]/g, transformLetter);
};

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) I thought that was as good as it could get, at first. But the [A-Za-z] in the regex bothered me. It wasn’t really doing anything; matching every character would have worked just as well. Then it hit me: with the regex ensuring that only letters were being passed to transformLetter(), I could simplify the if statements. I wasn’t 100% sure about this, so I started slow:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

The tests failed! I had forgotten that, in ASCII, upper-case “Z” comes before lower-case “a”. I needed to normalize the letter first:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter <= "m" || letter >= "A" && letter.toUpperCase() <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

That fixed it. Now I felt safe removing the second half of the if statement:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) The code was good, but the mutable charCode variable was bothering me. I prefer a more functional style. Rather than modifying the charCode variable directly, I decided to try storing the rotation amount instead.

First I introduced the new variable:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
    rotation = 13;
  } else {
    charCode -= 13;
    rotation = -13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) Then used it in place of charCode:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
    rotation = 13;
  } else {
    charCode -= 13;
    rotation = -13;
  }
  return String.fromCharCode(charCode + rotation);
}

(Test passed.) And inlined charCode using my editor’s automated refactoring:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    rotation = 13;
  } else {
    rotation = -13;
  }
  return String.fromCharCode(letter.charCodeAt(0) + rotation);
}

(Test passed.) Finally, I converted the if statement to a constant expression. In my editor, this was two automated refactorings: an automated conversion of if to ?:, and an automated joining of declaration and assignment. Then I manually changed let to const. The tests passed after each step, and the completed code looked like this:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  return input.replace(/[A-Za-z]/g, transformLetter);
};

function transformLetter(letter) {
  const rotation = letter.toUpperCase() <= "M" ? 13 : -13;
  return String.fromCharCode(letter.charCodeAt(0) + rotation);
}

This is a nice improvement over the original code. I could have made it more compact, but that would have sacrificed readability, so I was happy with it as it was. Some people might argue that the ternary expression was a step too far already.

And that’s what it looks to refactor, step by safe step. This example is simple enough that you could convert it all in one or two big steps, but if you learn how to take small steps on small problems like this, you’ll be able to do so on large problems, too, and that lets you tackle much harder refactoring problems.

To see an example of incremental refactoring applied to a larger problem, see Emily Bache’s superb walkthrough of the Gilded Rose kata. [Bache 2018]

Breaking a big design change into a sequence of small refactorings enables you to make dramatic design changes without risk. You can even make big changes incrementally, fixing part of the design one day and another part of it another day. This is a necessary part of using your slack to make big changes, and the key to successful Agile design.

Questions

How often should we refactor?

Ally
Test-Driven Development
Slack

Constantly. Perform little refactorings as you use TDD and bigger refactorings as part of your slack. Every week, your design should be better than it was the week before.

Isn’t refactoring rework? Shouldn’t we design our code correctly from the beginning?

If it were possible to design your code perfectly from the beginning, then refactoring would be rework. However, as everybody who’s worked with large systems knows, mistakes always creep in. Even if they didn’t, the needs of your software change over time, and your design has to be updated to match. Refactoring gives you the ability to constantly improve.

What about our database? That’s what really needs improvement.

You can refactor databases, too. Just as with normal refactorings, the trick is to proceed in small, behavior-preserving steps. Refactoring Databases: Evolutionary Database Design [Ambler and Sadalage 2006] describes how. However, data migration can take a long time, which requires special deployment considerations, as described in “Continuous Deployment” on p.XX.1

1XXX replace with direct reference when Continuous Deployment is written.

How can we make large design changes without conflicting with other team members?

Ally
Continuous Integration

Communicate regularly and use continuous integration. Before taking on a refactoring that will touch a bunch of code, integrate your existing code and let people know what you’re about to do. Sometimes other people can reduce the chance of integration conflicts by mirroring any big rename refactorings you’re planning on doing.

I can’t refactor without breaking a lot of tests! What am I doing wrong?

Your tests should check the behavior of your code, not the implementation, and refactoring should change implementation, but not behavior. So if you’re doing everything correctly, the tests shouldn’t break when you refactor.

If you’re having trouble with tests breaking when you refactor, it could be due to inappropriate use of test doubles (such as mock objects). Look at ways to improve your test design. One option is to use sociable tests instead of isolated tests, as “Write Sociable Tests” on p.XX discusses. If that doesn’t help, ask a mentor for guidance.

Prerequisites

Allies
Test-Driven Development
Zero Friction
Collective Code Ownership
Continuous Integration

Refactoring requires good tests and a zero-friction build. Without tests, refactoring is risky, because you can’t easily tell whether your changes have accidentally broken something. (Some IDEs provide a few guaranteed-safe refactorings, but other refactorings still require tests.) Without a zero-friction build, feedback is too slow to allow small steps. It’s still technically possible to refactor, but it’s slow and painful.

Refactoring also requires collective code ownership. Any significant design changes will require that you touch many parts of the code. Collective code ownership gives you the permission you need to do so. Similarly, refactoring requires continuous integration. Without it, each integration will be a nightmare of conflicting changes.

It’s possible—although not common—to spend too much time refactoring. You don’t need to refactor code that’s unrelated to your current work. Similarly, balance your need to finish stories with the need to have good code. As long as the code is better than it was when you started, you’re doing enough. In particular, if you think the code could be better, but you’re not sure how to improve it, it’s okay to leave it for someone else to improve later. That’s one of the great things about collective ownership: someone will improve it later.

Indicators

When you use refactoring as an everyday part of your toolkit:

  • The code constantly improves.

  • You make significant design changes safely and confidently.

  • Every week, the code is at least slightly better than it was the week before.

Alternatives and Experiments

There are no real alternatives to refactoring. No matter how carefully you design your code, it will eventually get out of sync with the needs of your application. Without refactoring, that disconnect will overwhelm you, leaving you to choose between rewriting the software, at great expense and risk, or abandoning it entirely.

However, there are always opportunities to learn how to refactor better. That typically involves figuring out how to take smaller, safer, more reliable steps. Keep practicing. I’ve been at it for twenty years and I’m still learning new tricks.

Further Reading

Refactoring: Improving the Design of Existing Code [Fowler 2018] is the definitive reference for refactoring. It’s also a great read. Buy it.

Refactoring to Patterns [Kerievsky 2004b] takes Fowler’s work one step further, showing how refactorings can string together to achieve significant design changes. It’s a good way to learn more about how to use individual refactorings to achieve big results.

Refactoring Databases: Evolutionary Database Design [Ambler and Sadalage 2006] shows how refactoring can apply to database schemas.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Test-Driven Development

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Test-Driven Development

Audience
Programmers

We produce high-quality code in small, verifiable steps.

“What programming languages really need is a ‘DWIM’ instruction,” the joke goes. “Do what I mean, not what I say.”

Programming is demanding. It requires perfection, consistently, for months and years of effort. At best, mistakes lead to code that won’t compile. At worst, they lead to bugs that lie in wait and pounce at the moment that does the most damage.

People aren’t so good at perfection. No wonder, then, that software is buggy.

Wouldn’t it be wonderful if there were a tool that alerted you to programming mistakes moments after you made them—a tool so powerful, it virtually eliminated the need for debugging?

There is such a tool, or rather, a technique. It’s test-driven development, and it actually delivers these results.

Test-driven development, or TDD, is a rapid cycle of testing, coding, and refactoring. When adding a feature, a programmer may perform dozens of these cycles, implementing and refining the software in tiny steps until there is nothing left to add and nothing left to take away. Done well, TDD eliminates an entire class of programming errors. It doesn’t prevent all bugs, but it does ensure that the code does exactly what the programmer meant it to do.

When used properly, TDD also helps you improve your design, documents the behavior of your code, enables refactoring, and guards against future mistakes. Better yet, it’s fun. You’re always in control and you get this constant reinforcement that you’re on the right track.

TDD isn’t perfect, of course. TDD is difficult to add to legacy codebases. It takes extra effort to apply to code that involves the outside world, such as user interfaces, networking, and databases. It can take a few months of steady use to overcome the initial learning curve.

Try it anyway. Although TDD benefits from other Agile practices, it doesn’t require them. You can use it with almost any code.

Why TDD Works

Back in the days of punch cards, programmers laboriously hand-checked their code to make sure it would compile. A compile error could lead to failed batch jobs and intense debugging sessions to look for the misplaced character.

Getting code to compile isn’t such a big deal anymore. Most IDEs check your syntax as you type, and some even compile every time you save. The feedback loop is so fast that errors are easy to find and fix. (See “Key Idea: Fast Feedback” on p.XX.) If something doesn’t compile, there isn’t much code to check.

Test-driven development applies the same principle to programmers’ intention. Just as modern environments provide feedback on the syntax of your code, TDD cranks up the feedback on the semantics of your code. Every few minutes—as often as every 20 or 30 seconds—TDD verifies that the code does what you think it should do. If something goes wrong, there are only a few lines of code to check. Mistakes become obvious.

TDD is a series of validated hypotheses.

TDD accomplishes this trick through a series of validated hypothesis. You work in very small steps, and at every step, you make a mental prediction about what’s going to happen next. First you write a bit of test code and predict it will fail in a particular way. Then a bit of production code and predict the test will now pass. Then a small refactoring and predict the tests will pass again. If a prediction is ever wrong, you stop and figure it out—or just back up and try again.

As you go, the tests and production code mesh together to check each other’s correctness, and your successful predictions confirm that you’re in control of your work. The result is code that does exactly what you thought it should. You can still forget something, or misunderstand what needs to be done. But you can have confidence that the code does what you intended.

When you’re done, the tests remain. They’re committed with the rest of the code, and they act as living documentation of how you intended the code to behave. More importantly, your team runs the tests with (nearly) every build, ensuring that the code continues to work as originally intended. If someone accidentally changes the code’s behavior—for example, with a misguided refactoring—the tests fail, signaling the mistake.

How to Use TDD

You’ll need a programmer’s testing framework to use TDD. For historical reasons, they’re called “unit testing frameworks,” although they’re useful for all sorts of tests. Every popular language has one, or even multiple—just do a web search for “<language> unit test framework.” Popular examples include JUnit for Java, xUnit.net for .NET, Mocha for JavaScript, and GoogleTest for C++.

TDD doesn’t prevent mistakes; it reveals them.

TDD follows the “red, green, refactor” cycle illustrated in figure “The TDD Cycle”. Other than time spent thinking, each step should be incredibly small, providing you with feedback within a minute or two. Counterintuitively, the better at TDD someone is, the better they are at taking smaller steps, and the faster they go. This is because TDD doesn’t prevent mistakes; it reveals them. Small steps means fast feedback, and fast feedback means mistakes are easier and faster to fix.

A chart showing four steps: “Think,” followed by “Red bar,” followed by “Green bar,” followed by “Refactor.” There’s a loop from “Refactor” back to “Green bar,” and another loop from “Refactor” back to “Think.”

Figure 1. The TDD cycle

Step 1: Think

TDD is “test-driven” because you start with a test, and then only write enough code to make the test pass. The saying is, “Don’t write any production code unless you have a failing test.”

Your first step, therefore, is to engage in a rather odd thought process. Imagine what behavior you want your code to have, then think of the very first piece of that to implement. It should be small. Very small. Less than five lines of code small.

Next, think of a test—also just a few lines of code—that will fail until exactly that code is present. Think of something that tests the code’s behavior, not its implementation. As long as the interface doesn’t change, you should be able to change the implementation at any time, without having to change the test.

Allies
Pair Programming
Mob Programming
Spike Solutions

This is the hardest part of TDD, because it requires thinking two steps ahead: first, what you want to do; second, what test will require you to do it. Pairing and mobbing help. While the driver works on making the current test pass, the navigator thinks ahead, figuring out which increment and test should come next.

Sometimes, you won’t understand your code well enough to test-drive it. Thinking two steps ahead will be too difficult. When that happens, use a spike solution to figure out how to approach the problem, then rebuild it using TDD.

Step 2: Red bar

Once you know your next step, write the test. Write just enough test code for the current increment of behavior—hopefully fewer than five lines of code. If it takes more, that’s okay; just try for a smaller increment next time.

Write the test in terms of the code’s public interface, not how you plan to implement its internals. Respect encapsulation. The first time you test a class, module, method, or function, that means your test will use names that don’t exist yet. This is intentional: it forces you to design your interface from the perspective of a user of that interface, not as its implementer.

Ally
Zero Friction

After the test is coded, predict what will happen. Typically, the test should fail, resulting in a red progress bar in most test runners. Don’t just predict that it will fail, though; predict how it will fail. Remember, TDD is a series of validated hypothesis, and this is your first hypothesis.

Then use your watch script or IDE to run the tests. You should get feedback within a few seconds. Compare the result to your prediction. Did they match?

If the test doesn’t fail, or if it fails in a different way than you expected, you’re no longer in control of your code. Perhaps your test is broken, or it doesn’t test what you thought it did. Troubleshoot the problem. You should always be able to predict what’s going to happen.

Your goal is to always know what the code is doing and why.

It’s just as important to troubleshoot unexpected successes as it is to troubleshoot unexpected failures. Your goal isn’t merely to have tests that pass; it’s to remain in control of your code—to always know what the code is doing and why.

Step 3: Green bar

Next, write just enough production code to get the test to pass. Again, you should usually need less than five lines of code. Don’t worry about design purity or conceptual elegance; just do what you need to do to make the test pass. This is okay because you’ll be refactoring in a moment.

Make another prediction and run the tests. This is your second hypothesis.

The tests should pass, resulting in a green progress bar. If the test fails, get back to known-good code as quickly as you can. Often, the mistake will be obvious. After all, you’ve only written a few new lines.

If the mistake isn’t obvious, consider undoing your change and trying again. Sometimes it’s best to delete or comment out the new test and start over with a smaller increment. Remaining in control is key.

It’s always tempting to beat your head against the problem rather than backing up and trying again. Despite a few decades of experience, I do it too. And yet, that same hard-won experience have taught me that trying again with a smaller increment is almost always faster and easier.

That doesn’t stop me from beating my head against walls—it always feels like the solution is just around the corner—but I have finally learned to set a timer so the damage is contained. If you can’t bring yourself to undo right away, set a five or ten-minute timer, and promise yourself that you’ll back up and try again, with a smaller increment, when the timer goes off.

Step 4: Refactor
Ally
Refactoring

Once your tests are passing again, you can now refactor without worrying about breaking anything. Review the code you have so far and look for possible improvements. Ask your navigator if they have any suggestions.

Incrementally refactor to make each improvement. Use very small refactorings—less than a minute or two each, certainly not longer than five minutes—and run the tests after each one. They should always pass. As before, if the test doesn’t pass and the mistake isn’t immediately obvious, undo the refactoring and get back to known-good code.

Ally
Simple Design

Refactor as much as you like. Make your design as good as you can, but limit your changes to code related to your current task. Keep the design focused on the software’s current needs, not what might happen in the future.

While you refactor, don’t add any functionality. Refactoring isn’t supposed to change behavior. New behavior requires a failing test.

Step 5: Repeat

When you’re ready to add new behavior, start the cycle over again.

If things are going smoothly, with every hypothesis matching reality, you can “upshift” and take bigger steps. (But generally not more than five lines of code at a time.) If you’re running into problems, “downshift” and take smaller steps.

The key to TDD is small increments and fast feedback.

The key to success with TDD is small increments and fast feedback. Every minute or two, you should get a confirmation that you’re on the right track and your changes did what you expected them to do. Typically, you’ll run through several cycles very quickly, then spend more time thinking and refactoring for a few cycles, then speed up again.

Eat the Onion From the Inside Out

The hardest part about TDD is figuring out how to take small steps. Luckily, coding challenges are like ogres, and onions: they have layers. The trick with TDD is to start with the sweet, juicy core, and then work your way out from there. You can use any strategy you like, but this is the approach I typically use:

  1. Core interface. Start by defining the core interface that you want to call, and write a test that calls that interface in the simplest possible way. Use this as an opportunity to see how the interface works in practice. Is it comfortable? Does it make sense? To make the test pass, you can just hard-code the answer.

  2. Calculations and branches. Your hard-coded answer isn’t enough. What calculations and logic are at the core of your new code? Start adding them, one branch and calculation at a time. Focus on the happy path: how the code will be used when everything’s working properly.

  3. Loops and generalization. Your code will often involve loops or alternative ways of being used. Once the core logic has been implemented, add support for those alternatives, one at a time. You’ll often need to refactor the logic you’ve built into a more generic form to keep the code clean.

  4. Special cases and error handling. After you’ve handled all the happy-path cases, think about everything that can go wrong. Do you call any code that could throw an exception? Do you make any assumptions that need to be validated? Write tests for each one.

  5. Runtime assertions. As you work, you might identify situations that can only arise as the result of a programming error, such as an array index that’s out of bounds, or a variable that should never be null. Add run-time assertions for these cases so they fail fast. (See “Fail Fast” on p.XX.) Add these assertions as soon as you see the opportunity, not just at the end. They don’t need to be tested, since they’re just an added safety net.

James Grenning uses the mnemonic “ZOMBIES” to express the same idea: Test Zero, then One, then Many. While you test, pay attention to Boundaries, Interfaces, and Exceptions, all while keeping the code Simple. [Grenning 2016]

A TDD Example

TDD is best understood by watching somebody do it. I have several video series online demonstrating real-world TDD. At the time of this writing, my free “TDD Lunch & Learn” series is the most recent. It has 21 episodes covering everything from TDD basics all the way up to thorny problems such as networking and timeouts. [Shore 2020]

I’ll describe the first of these examples here. It uses TDD to create a ROT-13 encoding function. (ROT-13 is a simple Caesar cipher where “abc” becomes “nop”, and vice versa.) It’s a very simple problem, but it’s a good example of how even small problems can be broken down into very small steps.

In this example, notice the techniques I use to work in small increments. They may even seem ridiculously small, but this makes finding mistakes easy, and that helps me go faster. As I’ve said, the more experience you have with TDD, the smaller the steps you’re able to take, and the faster that allows you to go.

Start with the core interface

Think. First, I needed to decide how to start. As usual, the core interface is a good starting point. What did I want it look like?

This example was written in JavaScript—specifically, Node.js—so I had the choice between creating a class or just exporting a function from a module. There didn’t seem to be much value in making a full-blown class, so I decided to just make a rot13 module that exported a transform function.

Red bar. Now that I knew what I wanted to do, I was able to write a test that exercised that interface in the simplest possible way:

it("runs tests", function() {
  assert.equal(rot13.transform(""), "");
});

Before running the test, I made a hypothesis. Specifically, I predicted that the test would fail because rot13 didn’t exist, and that’s what happened.

Green bar. To make the test pass, I created the interface and hardcoded just enough to satisfy the test.

export function transform() {
  return "";
}

Hardcoding the return value is kind of a party trick, and I’ll often write a bit of real code during this first step, but in this case, there wasn’t anything else the code needed to do.

Good test names give you an overview of how the code is intended to work.

Refactor. Check for opportunities to refactor every time through the loop. In this case, I renamed the test from “runs tests,” which was leftover from my initial setup, to “does nothing when input is empty.” That’s obviously more helpful for future readers. Good tests document how the code is intended to work, and good test names allow the reader to get a high-level understanding by skimming through the names. Note how the name talks about what the production code does, not what the test does.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});
Calculations and branches

Think. Now I needed to code the core logic of the ROT-13 transform. Eventually, I knew I wanted to loop through the string and convert one character at a time, but that was too big of a step. I needed to think of something smaller.

A smaller step is obviously to “convert one character,” but even that was too big. Remember, the smaller the steps, the faster you’re able to go. I needed to think of just the first part of that. Ultimately, I decided to just transform one lowercase letter forward thirteen letters. Uppercase letters and looping backwards after “z” would wait for later.

Red bar. With such a small step, the test was easy to write:

it("transforms lower-case letters", function() {
  assert.equals(rot13.transform("a"), "n");
});

My hypothesis was that the test would fail, expecting "n" but getting "", and that’s what happened.

Green bar. Making the test pass was just as easy:

export function transform(input) {
  if (input === "") return "";

  const charCode = input.charCodeAt(0);
  charCode += 13;
  return String.fromCharCode(charCode);
}

Even though this was a small step, it forced me to work out the critical question of converting letters to character codes and back, something that I had to look up. Taking a small step allowed me to solve this problem in isolation, which made it easier to tell when I got it right.

Refactor. I didn’t see any opportunities to refactor, so it was time to go around the loop again.

Repeat. I continued in this way, step by small step, until the core letter transformation algorithm was complete.

  1. Lower-case letter forward: an (as I just showed)

  2. Lower-case letter backward: na

  3. First character before a doesn’t rotate: ``

  4. First character after z doesn’t rotate: {{

  5. Upper-case letters forward: AN

  6. Upper-case letters backward: NA

  7. More boundary cases: @@ and [[

After each step, I considered the code and refactored when appropriate. Here are the resulting tests. The numbers correspond to each step. Note how some steps resulted in new tests, and others just enhanced an existing test.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});

it("transforms lower-case letters", function() {
  assert.equal(rot13.transform("a"), "n"); ①
  assert.equal(rot13.transform("n"), "a"); ②
});

it("transforms upper-case letters", function() {
  assert.equal(rot13.transform("A"), "N");  ⑤
  assert.equal(rot13.transform("N"), "A");  ⑥
});

it("doesn't transform symbols", function() {
  assert.equal(rot13.transform("`"), "`"); ③
  assert.equal(rot13.transform("{"), "{"); ④
  assert.equal(rot13.transform("@"), "@");  ⑦
  assert.equal(rot13.transform("["), "[");  ⑦
});

Here’s the production code. It’s harder to match each step to the code because there was so much refactoring (see the video for details), but you can see how TDD is an iterative process that gradually causes the code to grow:

export function transform() {
  if (input === "") return "";

  let charCode = input.charCodeAt(0);                                    ①
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {    ③④⑤
    charCode += 13;                                                      ①
  }
  if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {   ② ④ ⑥
    charCode -= 13;                                                       ②
  }
  return String.fromCharCode(charCode);                                  ①
}

function isBetween(charCode, firstLetter, lastLetter) {                      ④
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);④
}                                                                            ④

function codeFor(letter) {                                                 ③
  return letter.charCodeAt(0);                                             ③
}                                                                          ③

The last step (more boundary cases) didn’t result in new production code, but I included it just to make sure I hadn’t made any mistakes.

Loops and generalization

Think. So far, the code only handled strings with one letter. Now it was time to generalize it to support full strings.

Refactor. I realized that this would be easier if I factored out the core logic, so I jumped back to the “Refactoring” step to do so.

export function transform(input) {
  if (input === "") return "";

  let charCode = input.charCodeAt(0);
  return transformLetter(charCode);
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  }
  if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween...
function codeFor...

Refactoring to make the next step easier is a technique I use all the time. Sometimes, during the “Red bar” step, I realize that I should have refactored first. When that happens, I comment out the test temporarily so I can refactor while my tests are passing. This makes it faster and easier for me to detect refactoring errors.

Red bar. Now I was ready to generalize the code. I updated one of my tests to prove a loop was needed:

it("transforms lower-case letters", function() {
  assert.equal(rot13.transform("abc"), "nop");
  assert.equal(rot13.transform("n"), "a");
});

I expected it to fail, expecting "nop" and getting "n", because it was only looking at the first letter, and that’s exactly what happened.

Green bar. I modified the production code to add the loop:

export function transform(input) {
  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter...
function isBetween...
function codeFor...
Ally
Zero Friction

Refactor. I decided to flesh out the tests so they’d work better as documentation. This wasn’t strictly necessary, but I thought it would make the ROT-13 logic more obvious. I changed one assertion at a time, of course. The feedback was so fast and frictionless, executing automatically every time I saved, there was no reason not to.

In this case, everything worked as expected, but if something had failed, changing one assertion at a time would have made debugging it just a little bit easier. Those benefits add up.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});

it("transforms lower-case letters", function() {
  assert.equal(
    rot13.transform("abcdefghijklmnopqrstuvwxyz"), "nopqrstuvwxyzabcdefghijklm" ①
  );
});

it("transforms upper-case letters", function() {
  assert.equal(
    rot13.transform("ABCDEFGHIJKLMNOPQRSTUVWXYZ"), "NOPQRSTUVWXYZABCDEFGHIJKLM" ②
  );
});

it("doesn't transform symbols", function() {
  assert.equal(rot13.transform("`{@["), "`{@[");                                ③
});
Special cases, error handling, and runtime assertions

Finally, I wanted to look at everything that could go wrong. I started with the runtime assertions. How could the code be used incorrectly? Usually, I don’t test my runtime assertions, but I did so this time for the purpose of demonstration:

it("fails fast when no parameter provided", function() {         ①
  assert.throws(                                                 ①
    () => rot13.transform(),                                     ①
    "Expected string parameter"                                  ①
  );                                                             ①
});                                                              ①

it("fails fast when wrong parameter type provided", function() { ②
  assert.throws(                                                 ②
    () => rot13.transform(123),                                  ②
    "Expected string parameter"                                  ②
  );                                                             ②
});                                                              ②

Of course, I followed the TDD loop and added the tests one at a time. Implementing them meant adding a guard clause, which I also implemented incrementally:

export function transform(input) {
  if (input === undefined ①  || typeof input !== "string" ②  ) {
    throw new Error("Expected string parameter");                 ①
  }                                                               ①
  ...

Good tests also act as documentation, so my last step is always to review the tests and think about how well they communicate to future readers. Typically, I’ll start with the general, “happy path” case, then go into specifics and special cases. Sometimes I’ll add a few tests just to clarify behavior, even if I don’t have to change the production code. That was the case with this code. These are the tests I ended up with:

it("does nothing when input is empty", ...);
it("transforms lower-case letters", ...);
it("transforms upper-case letters", ...);
it("doesn’t transform symbols", ...);
it("doesn’t transform numbers", ...);
it("doesn’t transform non-English letters", ...);
it("doesn’t break when given emojis", ...);
it("fails fast when no parameter provided", ...);
it("fails fast when wrong parameter type provided", ...);

And the final production code:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) &amp;&amp; charCode <= codeFor(lastLetter);
}

function codeFor(letter) {
  return letter.charCodeAt(letter);
}

At this point, the code did everything it needed to. Readers familiar with JavaScript, however, will notice that the code can be further refactored and improved. I continue the example in “Refactoring in Action” on p.XX.

Fast and Reliable Tests

Teams that embrace TDD accumulate thousands of tests. The more tests you have, the more important speed and reliability become.

Your tests must be fast, and they must produce the same answer every time.

In TDD, you run the tests as often as one or two times every minute. They must be fast, and they must produce the same answer every time. If they don’t, you won’t be able to get feedback within 1-5 seconds, and that’s crucial for the TDD loop to work effectively. You’ll stop running the tests as frequently, which means you won’t catch errors as quickly, which will slow you down.

Ally
Zero Friction

You can work around the problem by programming your watch script to only run a subset of tests, but eventually, slow and flaky tests will start causing problems during integration, too. Instead of deploy providing feedback within five minutes, it will take tens of minutes, or even hours. To add insult to injury, the tests will fail randomly, requiring you to start the long process all over again, adding friction and causing people to ignore genuine failures.

It is possible to write fast and reliable tests. It takes practice and good design, but once you know how, writing fast, reliable tests is faster and easier than writing slow, flaky tests. Here’s how:

Rely on narrow unit tests

Broad tests1 are written to cover large parts of the software: for example, they might launch a web browser, navigate to a URL, click buttons and enter data, then check that the browser shows the expected result.

1The terms “broad,” “narrow,” “sociable,” and “solitary” come from Jay Fields. XXX find original reference

Although this can be easy and seems like a good way to get good test coverage, it’s a trap. Broad tests are slow and unreliable. You need your build to run hundreds or thousands of tests per second, and to do so with perfect reliability. The way to do so is narrow tests.

A narrow test is focused on a small amount of code. Usually a method or function, or several, in a particular class or module. Sometimes, a narrow test will focus on a small cross-cutting behavior that involves several modules.

The best types of narrow tests are called unit tests in the Agile community, although there’s some disagreement over the exact definition, as “Other Unit Test Definitions” on p.XX discusses. I use Michael Feathers’ definition: a unit test is a narrow test that runs entirely in memory, without involving the outside world. In other words, it can’t touch a file system, communicate across a network, or talk to a database, and you don’t need to do special things to your environment (such as editing a configuration file) to run it. [Feathers 2004]

The vast majority of your tests should be unit tests. Because they run entirely in memory, they’re fast and reliable. The size of your unit test code should be proportional to the size of your production code. The ratios vary, but it will often be close to 1:1.

Creating narrow unit tests requires good design. If you have trouble writing unit tests, it could be a sign of problems in your design. Look for ways to decouple your code so that each class or module can be tested independently. Consider asking a mentor for help, and see “Simple Design” on p.XX for ideas.

Test outside interactions with narrow integration tests

Unit tests only test code that’s in memory, but your software doesn’t operate entirely in memory. It also has to talk to the outside world. To test code that does so, use narrow integration tests, also known as focused integration tests.

Narrow integration tests are just like unit tests, with the exception that they involve the outside world. Create them using TDD in the same way that you create a unit test.

Because they involve the outside world, narrow integration tests are much slower than narrow unit tests. Where unit tests can run at a rate of hundreds or thousands per second, narrow integration tests typically run at a rate of dozens per second.

Design your code to minimize the number of narrow integration tests you need. For example, if your code depends on a third-party service, don’t call the service directly from the code that needs it. Instead, create an infrastructure wrapper: an adapter that encapsulates the service and its network calls. Use the infrastructure wrapper in the rest of your code. “Third-Party Components” on p.XX has more about adapters and the “Application Infrastructure” episode of [Shore 2020] has an example.

You should end up with a relatively small number of narrow integration tests, proportional to the number of external systems your code interacts with.

Control global state

Any tests that deal with global state need careful thought. That includes global variables, such as static (class) variables and singletons; external data stores and systems, such as file systems, databases, and services; and machine-specific state and functions, such as the system clock, locale, time zone, and random number generator.

Tests are often written to assume that global state will be set in a certain way. Most of the time, it will be. But once in a while, it isn’t, often due to a race condition, and the test fails for no apparent reason. When you run it again, the test passes. The result is a flaky test: a test that works most of the time, but occasionally fails randomly.

Flaky tests are insidious. Because re-running the test “fixes” the problem, people learn that the right way to deal with flaky tests is to just run them again. Once you’ve accumulated hundreds of flaky tests, your test suite requires multiple runs before it succeeds. By that time, fixing the problem takes a lot of work.

When you encounter a flaky test, fix it the same day.

When you encounter a flaky test, fix it the same day. Flaky tests are the result of poor design. The sooner you fix them, the less problems you’ll have in the future.

The design flaw at the root of flaky tests is allowing global state to pollute your code. Some global state, such as static variables and singletons, can be removed through careful design. Other sorts of global state, such as the system clock and external data, can’t be avoided, but it can be carefully controlled. Use an infrastructure wrapper to abstract it away from the rest of your codebase, and test-drive it with narrow integration tests.

For example, if your code needs to interact with the system clock—for example, to time out a request, or to get the current date—create a wrapper for the system clock and use that, rather than the actual system clock, in the rest of your code. The “No More Flaky Clock Tests” episode of [Shore 2020] has an example.

Write sociable tests

Tests can be solitary or sociable. A solitary test is programmed so that all dependencies of the code under test are replaced with special test code called a “test double,” also known as a “mock.” (Technically, a “mock” is a specific type of test double, but the terms are often used interchangeably.)

Solitary tests allow you to test that your code under test calls its dependencies, but they don’t allow you to test that the dependencies work the way your code expects them to. The test doesn’t actually run the dependencies; it runs the test double instead. So if you ever make a change to a dependency that breaks the expectations of any code that uses it, your tests will continue to pass, and you’ll have accidentally introduced a bug.

To prevent this problem, people who write solitary tests also write broad tests to make sure that everything works together correctly. This is duplicated effort, and those broad tests are often slow and flaky.

A better approach, in my opinion—although opinions are divided on this point—is to use sociable tests rather than solitary tests. A sociable test runs the code under test without replacing its dependencies. The code uses its actual dependencies when in runs, which means that the tests fail if the dependencies don’t work the way the code under test expects. Figure “Solitary and Sociable Tests” illustrates the difference.

A figure in two parts. Part A is labelled “Solitary tests.” It shows a series of relationships: “A” relies on “B,” which relies on “C.” Each of A, B, and C have a test, and each has a mock that the test uses. Circles show that A, B, and C are each tested, but X’s show that the relationship between A and B, and between B and C, is not tested. Part B of the figure is labelled “Sociable tests.” It shows the same tests and relationships as part A, but it doesn’t have any mocks. The figure uses circles to show that the test of A also tests A’s relationship with B, and the test of B also tests B’s relationship with C. As a result, there are no gaps that aren’t tested.

Figure 2. Solitary and sociable tests

The best unit tests—again, in my opinion—are narrow, sociable tests. They’re narrow in that the test is only testing the class or module under test. They’re sociable in that the code under test still calls its real dependencies. The result is fast tests that provide full confidence that your code works as expected, without requiring the overhead and waste of additional broad tests.

This does raise the question: how do you prevent sociable tests from talking to the outside world? A big part of the answer is to design your code to separate infrastructure and logic, as I’ll explain in a moment. The other part is the judicious use of test doubles in any code that uses infrastructure wrappers. My “Testing Without Mocks” article [Shore 2018] catalogs design patterns for doing so, and [Shore 2020] has extensive examples.

Separate infrastructure and logic

Code that is pure logic, with no dependencies on anything that involves the outside world is, by far, the easiest code to test. So, to make your tests faster and more reliable, separate your logic from your infrastructure. As it turns out, this is a good way to keep your design clean, too.

There are a variety of ways to keep infrastructure and logic separate. Alistair Cockburn’s “Hexagonal Architecture” [Cockburn 2008], Gary Bernstein’s “Functional Core, Imperative Shell“ [Bernstein 2012], and my “A-Frame Architecture” [Shore 2018] are all similar ways of tackling the problem.

Use broad tests only as a safety net
If you use TDD correctly, broad tests shouldn’t be needed.

If you use TDD, narrow unit tests, narrow integration tests, and sociable tests correctly, your code should be thoroughly covered. Broad tests shouldn’t be needed.

For safety, though, it’s okay to have a small number of smoke tests. Smoke tests are broad tests that confirm that your software doesn’t go up in flames when you run it. They’re not comprehensive—they only test your most common scenarios. Your narrow tests are for comprehensive testing.

Broad tests tend to be very slow, often requiring seconds per test, and are difficult to make reliable. You should only need a handful of them.

Ally
Root-Cause Analysis
No Bugs

If you didn’t build your software with TDD from the beginning, or if you’re not confident in your ability to use TDD correctly, it’s okay to have more broad tests for safety. But do treat them only as a safety net. If they ever catch an error that your narrow tests don’t, that’s a sign of a problem with your testing strategy. Figure out what went wrong, fix the missing test, and change your testing approach to prevent further gaps.

Eventually, you’ll have confidence in your test suite and can reduce the number of broad tests to a minimum.

Adding Tests to Existing Code

Sometimes you have to add tests to existing code. Either the code won’t have any tests at all, or it will have broad, flaky tests that need to be replaced.

There’s a chicken-and-egg problem with adding tests to code. Good tests—that is, narrow tests—need to poke into your code to set up dependencies and validate state. Unless your code was written with testability in mind—and non-TDD’d code almost never is—you won’t be able to write good tests.

So you need to refactor. The problem is, in a complex codebase, refactoring is dangerous. Side effects lurk behind every function. Twists of logic wait to trip you up. In short, if you refactor, you’re likely to break something without realizing it.

So you need tests. But to test, you need to refactor. But to refactor, you need tests. Etc., etc., argh.

To break the chicken-and-egg dilemma, you need confidence that your refactorings are safe: that they cannot change the behavior of the code. Luckily, modern IDEs have automated refactorings, and, depending on your language and IDE, they might be guaranteed to be safe. According to Arlo Belshee, the core six safe refactorings you need are Rename, Inline, Extract Method/Function, Introduce Local Variable, Introduce Parameter, and Introduce Field. His article, “The Core 6 Refactorings” [Belshee 2016], is well worth reading.

If you don’t have guaranteed-safe refactorings, you can also use characterization tests, also known as pinning tests or approval tests. Characterization tests are temporary, broad tests that are designed to exhaustively test every behavior of the code you’re changing. Llewellyn Falco’s “Approvals” testing framework, available on GitHub at https://github.com/approvals, is a powerful tool for creating these tests. Emily Bache’s video demonstration of the “Gilded Rose” kata [Bache 2018] is an excellent example of how to use approval tests to refactor unfamiliar code.

Once you have the ability to refactor safely, you can change the code to make it cleaner. Work in very small steps, focusing on Belshee’s core six refactorings, and running your tests after each step. Simplify and refine the code until one part of it is testable, then add narrow tests to that part. You may need to write solitary tests rather than sociable tests, to begin with.

Continue refining, improving, and testing until all the code you’re working on is covered by high-quality narrow tests. Once it is, you can delete the characterization tests and any other broad tests of that code.

Questions

Isn’t TDD wasteful?

I go faster with TDD than without it. With enough practice, I think you will too.

TDD is faster because programming doesn’t just involve typing at the keyboard. It also involves debugging, manually running the code, checking that a change worked, and so forth. Michael “GeePaw” Hill calls this activity GAK, for “Geek At Keyboard.” With TDD, you spend much less time GAKking around and more time doing fun programming work. You also spend less time studying code, because the tests act as documentation and inform you when you make mistakes. Even though tests take time to write, the net result is that you have more time for development, not less. GeePaw Hill’s video, “TDD & The Lump of Coding Fallacy” [Hill 2018], is an excellent and entertaining explanation of this phenomenon.

TDD does take time to learn and apply, especially to new UI technologies and existing code. It’s worth it, but it can take a few months before it’s a net positive.

What do I need to test when using TDD?

The saying is, “Test everything that can possibly break.” To determine if something could possibly break, I think, “Do I have confidence that I’m doing this correctly, and that nobody in the future will inadvertently break this code?”

I’ve learned through painful experience that I can break nearly everything, so I test nearly everything. The only exception is code without any logic, such as simple getters and setters, or a function that only calls another function.

You don’t need to test third-party code unless you have some reason to distrust it. But it is a good idea to wrap third-party code in an adapter that you control, and test that the adapter works the way you want it to. “Third-Party Components” on p.XX has more about these adapters.

How do I test private methods?

Start by testing public methods. As you refactor, some of that code will move into private methods, but it will still be covered by existing tests.

If your code is so complex that you need to test a private method directly, this is a good indication that you should refactor. You can move the private function into a separate module or method object, where it will be public, and test it directly.

How can I use TDD when developing a user interface?

TDD is particularly difficult with user interfaces because most UI frameworks weren’t designed with testability in mind. Many people compromise by writing a very thin, untested translation layer that only forwards UI calls to a presentation layer. They keep all their UI logic in the presentation layer and use TDD on that layer as normal.

There are tools that allow you to test a UI directly, by making HTTP calls (for web-based software) or by pressing buttons and simulating window events (for client-side software). Although they’re usually used for broad tests, my preference is to use them to write narrow integration tests of my UI translation layer.

Should we refactor our test code?

Absolutely. Tests have to be maintained, too. I’ve seen otherwise-fine codebases go off the rails because of brittle and fragile test suites.

That said, tests are a form of documentation and should generally read like a step-by-step recipe. Loops and logic should be moved into helper functions that make the underlying intent of the test easier to understand. Across each test, though, it’s okay to have some duplication if it makes the intent of the test more clear. Unlike production code, tests are read much more often than they’re modified.

Arlo Belshee uses the acronym “WET,” for “Write Explicit Tests,” as a guiding principle for test design. It’s in contrast with the DRY (Don’t Repeat Yourself) principle used for production code. His article on test design, “WET: When DRY Doesn’t Apply,” is superb. [Belshee 2016a]

Prerequisites

Although TDD is a very valuable tool, it does have a two- or three-month learning curve. It’s easy to apply to toy problems such as the ROT-13 example, but translating that experience to larger systems takes time. Legacy code, proper test isolation, and narrow integration tests are particularly difficult to master. On the other hand, the sooner you start using TDD, the sooner you’ll figure it out, so don’t let these challenges stop you.

Because TDD has a learning curve, be careful about adopting it without permission. Your organization could see the initial slowdown and reject TDD without proper consideration. Similarly, be cautious about being the only one to use TDD on your team. It’s best if everyone agrees to use it together, otherwise you’re likely to end up with other members of the team inadvertently breaking your tests and creating test-unfriendly code.

Once you do adopt TDD, don’t continue to ask permission to write tests. They’re a normal part of development. When sizing stories, include the time required for testing in your size considerations.

Ally
Zero Friction

Fast feedback is crucial for TDD to be successful. Make sure you can get feedback within 1-5 seconds, at least for the subset of tests you’re currently working on.

Finally, don’t let your tests become a straightjacket. If you can’t refactor your code without breaking a lot of tests, something is wrong. Often, it’s a result of overzealous use of test doubles. Ask a mentor for help.

Indicators

When you use TDD well:

  • You spend little time debugging.

  • You continue to make programming mistakes, but you find them in a matter of minutes and can fix them easily.

  • You have total confidence that the whole codebase does what programmers intended it to do.

  • You aggressively refactor at every opportunity, confident in the knowledge that the tests will catch any mistakes.

Alternatives and Experiments

TDD is at the heart of the Delivering practices. Without it, Delivering fluency will be difficult or even impossible to achieve.

A common misinterpretation of TDD, as “Test-Driven Debaclement” on p.XX illustrates, is to design your code first, write all the tests, and then write the production code. This approach is frustrating and slow, and it doesn’t allow you to learn as you go.

Another approach is to write tests after writing the production code. This is very difficult to do well: the code has to be designed for testability, and it’s hard to do so unless you write the tests first. It’s also tedious, with a constant temptation to wrap up and move on. In practice, I’ve yet to see after-the-fact tests come close to the detail and quality of tests created with TDD.

Even if these approaches do work for you, TDD isn’t just about testing. It’s really about using very small, continuously-validated hypotheses to confirm that you’re on the right track and produce high-quality code. With the exception of Kent Beck’s TCR, which I’ll discuss in a moment, I’m not aware of any alternatives to TDD that allow you to do so while also providing the documentation and safety of a good test suite.

Under the TDD banner, though, there are many, many experiments that you can conduct. TDD is one of those “moments to learn, lifetime to master” skills. One of the biggest opportunities for experimentation is between the “classicist” or “mockist” approach. In this book, I’ve shown how to apply the classicist approach. The mockist approach, spearheaded by Steve Freeman and Nat Pryce, is also worth investigating. Their book, Growing Object-Oriented Software, Guided by Tests, is well worth reading. [Freeman and Pryce 2010]

More recently, Kent Beck has been experimenting with an idea he calls TCR: test && commit || revert. It refers to a small script that automatically commits your code if the tests pass, and reverts it if the tests fail. Although TCR sacrifices the “red bar” step of TDD, which a lot of people like, it forces you to take very small steps. This gives you the same series of validated hypotheses that TDD does, and arguably makes them even smaller and more frequent. That’s one of the hardest and most important things to learn about TDD. TCR is worth trying as an exercise, if nothing else.

Further Reading

This book only scratches the surface of TDD. For more detail about the approach I recommend here, see my “Testing Without Mocks” article [Shore 2018] and accompanying “TDD Lunch and Learn” video series [Shore 2020].

Test-Driven Development: By Example [Beck 2002] is an excellent introduction to TDD by the person who invented it. If you liked the ROT-13 example, you’ll like the extended examples in this book. The TDD patterns in Part III are particularly good.

Working Effectively with Legacy Code [Feathers 2004] is a must-have for anybody working with legacy code.

XXX Jay Fields, Astels, Rainsberger, 1ed p302

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Zero Friction

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Zero Friction

Audience
Programmers, Operations

When we’re ready to code, nothing gets in our way.

Imagine you’ve just started working with a new team. One of your new teammates, Pedro, walks you over to a development workstation.

“Since you’re new, we’ll start by deploying a small change,” he says, sitting down next to you. “This machine is brand-new, so we’ll have to set it up from scratch. First, clone the repo.” He tells you the command. “Now, run the build script.”

Commands start scrolling up the screen. Pedro explains. “We use a tool for reproducible builds. It uses a configuration file in the repo to make sure we all have the same tooling installed. Right now, it’s detected that you don’t have anything installed, so it’s installing the IDE, development tools, and images needed to develop and run the system locally.”

“This will take a while,” he continues. “After the first run, though, it’s instantaneous. It only updates again when we commit changes to the config. Come on, I’ll show you around the office.”

When you come back, the build is done. “Okay, let me show you the app,” Pedro says. “Type rundev to start it up.” Once again, information starts scrolling by. “This is all running locally,” Pedro explains proudly. “We used to have a shared test environment, and we were constantly stepping on each others’ toes. Now that’s all in the past. It even knows which services to restart depending on which files you change.”

Pedro walks you through the application. “Now let’s make a change,” he says. “Open up the IDE and run the watch script with the quick command. It will run the build when files change. The quick command tells it to only build and test the files that have changed.”

You follow his instructions and the script starts up, then immediately reports BUILD OK in green. “Nothing’s changed since we last ran the build,” Pedro explains, “so the script didn’t do anything. Now, let’s make a small change.” He directs you to a test file and has you add a test. When you save the changes, the watch script runs again and reports a test failure. It takes less than a second.

“We’ve put a lot of work into our build and test speed,” Pietro tells you. He’s clearly proud of it. “It wasn’t easy, but it’s totally worth it. We get feedback on most changes in a second or two. It’s done wonders for our ability to iterate and be productive. I’m not lying when I say this is the best development environment I’ve ever been in.”

“Now let’s finish up this change and deploy.” He shows you the production change needed to get the new test pass. Once again, when you save, the watch script runs the tests in about a second. This time, it reports success.

“Okay, we’re ready to deploy,” he says. “This is going into production, but don’t worry. The deploy script will run the full test suite, and we also have a canary server that checks to see if anything goes wrong. Type deploy to kick things off.”

You run the script and watch it go through its paces. A few minutes later, it says INTEGRATION OK, then starts deploying the code. “That’s it!” Pedro beams. “Once the integration succeeds, you can assume the deploy will too. If something goes wrong with the canary server, it will roll back the deploy and we’ll get a page. Welcome to the team!”

It’s been less than an hour, and you’ve already deployed to production. This is zero-friction development: when you’re ready to code, nothing gets in your way.

One-Second Feedback

When you make a change, get feedback in less than a second, or five at most.

Development speed is the most important area for eliminating friction. When you make a change, you need to get feedback about that change in less than a second, or five seconds at the very most.

This type of fast feedback is a game changer. You’re able to experiment and iterate so easily. Rather than making big changes, you can work in very small steps. Each change can be a line or two of code, which means that you always know where your mistakes are. Debugging becomes a thing of the past.

If feedback takes less than a second, it’s functionally instantaneous. You’ll make a change, see the feedback, and keep working. If it takes between one and five seconds, it won’t feel instantaneous, but it’s still acceptable. If it takes between five and ten seconds, it will feel slow. You’ll start being tempted to batch up changes. And if it’s more than ten seconds, you won’t be able to take small steps, and that will slow you down.

Ally
Test-Driven Development

To achieve one-second feedback, set up a watch script that automatically checks your code when you make a change. Inside the script, use a compiler or linter to tell you when you make syntax errors, and tests to tell you when you make semantic errors.

Alternatively, you can configure your IDE to check syntax and run tests, rather than writing a script. This can be an easy way to get started, although you’ll have to migrate to a script eventually. If you do start with an IDE-based approach, make sure its configuration can be committed to your repository and used by everyone on the team. You need the ability to share improvements easily.

When you save your changes, the script (or IDE) should give you immediate, unambiguous feedback. If everything worked, it should say OK. If anything failed, it should say FAILED, and provide information to help you troubleshoot the error. Most people make their tools display a green bar for success and a red bar for failure. I also program mine to play a sound—one for compile/lint failure, another for test failure, and a third for success—but that’s entirely optional.

As your codebase gets larger, one-second feedback will become harder to achieve. The first culprit is usually test speed. Instead of writing broad tests that check the whole system, write narrow tests that focus on the behavior of a small amount of code. Stub out slow and brittle parts of the system, such as file system, network, and database access. “Fast and Reliable Tests” on p.XX describes how.

As your system continues to grow, build speeds (compiling or linting) will become a problem. The solution will depend on your language. A web search for “speed up <language> build” will get you started. Typically, it will involve incremental builds: caching parts of the build so that only code that has changed gets rebuilt. The larger your system gets, the more creative you’ll have to be.

Ally
Continuous Integration

Eventually, you’ll probably need to set up two builds: one for fast feedback, and one for production deployment. Although it’s preferable for your local build to be the same as your production build, fast feedback is more important. Your deploy script can run your tests against the production build. As long as you have a good test suite and practice continuous integration, you’ll learn about discrepancies between the two builds before they’ve had a chance to get out of control.

Although good tests run at a rate of hundreds or thousands per second, you’ll eventually have too many tests to run them all in less than a second. When you do, you’ll need to revise your script to only run a subset of the tests. The easiest way is to group your tests into clusters, and run specific clusters based on the files that have changed.

Eventually, you may want to do a more sophisticated dependency analysis that detects exactly which tests to run for any given change. Some test runners can do this for you. It’s also not as hard to implement as you might think. The trick is to focus on what your team needs rather than making a generic solution that handles all possible edge cases.

Know Your Editor

Don’t let your code editor get in the way of your thoughts. This is particularly important when pairing or mobbing; when you’re navigating, there are few things more frustrating than watching a driver struggle with the editor.

Take the time to get to know your editor really, really well. Learn the keyboard shortcuts. If the editor provides automated refactorings, learn how to use them. (If it doesn’t, look for a better editor.) Learn their keyboard shortcuts, too. Take advantage of auto-formatting, and commit the formatting configuration file to your repository so your whole team is in sync. Learn how to use code completion, automatic fixes, function and method lookup, and reference navigation. And learn the keyboard shortcuts.

For an example of how much of a difference editor proficiency can make, see Emily Bache’s virtuoso performance in her Gilded Rose kata videos, particularly part 2. [Bache 2018]

Reproducible Builds

It worked on my machine!

Overheard

What happens when you check out an arbitrary commit from your repository? Say, from a year ago. (Go on, try it!) Does it still run? Do the tests still pass? Or does it require some esoteric combination of tooling and external services that have long since passed from memory into oblivion?

You should be able to pull any commit and expect it to work the same for every developer.

A reproducible build is a build that continues to work and pass its tests no matter which development machine you use to build it, and no matter how old the code you’re building is. You should be able to pull any commit and expect it to work the same way for every developer. Generally speaking, this requires two things:

1. Dependency Management

Dependencies are the libraries and tools your code requires to run. This includes your compiler or interpreter, run-time environment, packages downloaded from your language’s package repository, code created by other teams in your organization, and so forth. For your build to be reproducible, everybody needs to have the exact same dependencies.

In your build, check the version of every dependency, including tools such as your compiler. If a dependency is missing or using the wrong version, the build should either exit with an error or (preferably) install the correct version. Tools to do so include Nix, Bazel, and Docker. Check that you’re using the right version of your dependency management tool, too.

An easy way to ensure your software has the correct dependencies is to check them into your repository. This is called vendoring. It works best when your dependencies come in the form of source code rather than binaries. You can mix the two approaches: for example, a team with a Node.js codebase vendored its node_modules directory, but didn’t vendor the Node executable. Instead, they programmed the build to fail if the wrong version of Node was running.

2. Local Builds

Dependency management will ensure that your code runs the same way on every machine, but it won’t ensure that your tests pass. For your tests to pass, they need to run entirely locally, without communicating over the network. Otherwise, you’re likely to get inconsistent results when two people run the tests at the same time, and you won’t be able to build old versions. The services and data they depend on will have changed, and tests that used to pass will fail.

The same is true for when you run the code manually. To get consistent results and to be able to run old versions, everything the code depends on needs to be installed locally.

In some cases, it’s too difficult or expensive to run every dependency locally. You still need to be able to run your tests locally, though, for both reproducibility and speed. To do so, write your tests to use fake versions of any service you can’t run locally. The “Spy Server” pattern in [Shore 2018] describes how. For videos demonstrating the technique, see episodes 17 and 18 of [Shore 2020].

This raises the question: if you don’t test your software against its real dependencies, how do you know that it works? Because external services can change or fail at any time, the real answer is “monitoring.” (See the “Paranoic Telemetry” pattern of [Shore 2018].) But you can also run additional tests as a safety net in your deployment script.

Five-Minute Deploy

Your deploy script should report success or failure within five minutes—ten at most.

Create scripts for everything your team repeats. One of the most common examples is deployment. Your deploy script should create a production-grade build, run the full test suite against that build, integrate your code, deploy the code to production, and report success or failure within five minutes—ten at most.

A five-minute deploy is important because you need to remain available to fix any problems. Five minutes is enough for a stretch break and a new cup of coffee. Ten minutes is tolerable, but gets tedious. More than ten minutes and people start working on other tasks. Then, when a deploy fails, the code is left in limbo until somebody gets back to it.

Ally
Continuous Deployment

The deploy doesn’t need to literally complete within five minutes, although that’s preferable. Instead, it needs to report success or failure. After that, failures should be exceedingly rare. Typically, that means creating a production-grade build and running the main test suite. Additional pre-deployment actions, such as deploying to a canary server or running additional tests, can take longer. But they should only fail rarely, and when they do fail, the team needs to be alerted in some way.

For most teams, the thing standing between them and a five-minute deploy is the speed of their test suite. Focus on building narrow tests rather than broad end-to-end tests. Unreliable tests—tests that fail randomly—are another common problem that slows down deployment. “Fast and Reliable Tests” on p.XX explains how to fix both problems.

Control Complexity

An oft-overlooked source of friction for development teams is the complexity of their development environment. In their rush to get work done quickly, teams pull in popular tools, libraries, and frameworks to solve common development problems.

There’s nothing wrong with these tools, in isolation. But any long-lived software development effort is going to have specialized needs, and that’s where the quick and easy approach starts to break down. All those tools, libraries, and frameworks add up to an enormous cognitive burden, especially when you have to start diving into their internals to make them work together nicely. That ends up causing a lot of friction.

It’s more important to optimize maintenance costs than initial development, as “Key Idea: Optimize for Maintenance” on p.XX explains. Be thoughtful about the third-party dependencies you use. When you choose one, don’t just think about the problem it’s solving; think about the maintenance burden the dependency will add, and how well it will play with your existing systems. A simple tool or library your scripts can call is a great choice. A complex black box that wants to own the world probably isn’t.

Ally
Simple Design

In most cases, it’s best to wrap the third-party tool or library in code you control. The job of your code is to hide the underlying complexity and present a simple interface customized for your needs. The “Simple Design” practice explains further.

Automate Incrementally

Improve your automation continuously and incrementally, starting with your very first story. In a brand-new codebase, that means that your first development task is to set up your scripts.

Automate every repeated activity. To begin with, this means writing four scripts:

  • build: compile and/or lint, run tests, and report success or failure

  • watch: automatically run build when files change

  • deploy: run build in a production-like environment, integrate your code, and deploy

  • rundev: run the software locally for manual review and testing

You’re free to use whichever names you prefer, of course.

Keep your automation simple. For that first story, you don’t need sophisticated incremental builds or dependency graph analysis. Before you write any code, start by writing a build script that simply says BUILD OK. Nothing else! It’s like a “hello world” for your build.

Next, write a watch script that runs the build when files in your source tree are added, removed, or changed. Make sure it handles changes to the watch and build scripts themselves. Have it report how long the build takes, too. When that time exceeds five seconds, you’ll know it’s time to optimize.

The best way to detect file changes depends on your scripting language, but somebody’s probably written a library you can use. Try searching the web for “<language> watch for file changes.”

You may be tempted to use your IDE instead of a watch script. That’s okay, to start with, but you’ll still need to automate your build for the deploy script, so you could end up maintaining two separate builds. Beware of lock-in, too: eventually, the IDE won’t be able to provide one-second feedback. When that happens, rather than fighting the IDE, switch to a proper script-based approach. It’s more flexible.

Speaking of scripting languages, use a real programming language for your scripting. Your scripts can call out to tools, and some of those tools might have their own proprietary configuration languages, but orchestrate them all with real code that you control. As your automation becomes more sophisticated, you’ll appreciate the power a real programming language provides.

Treat your scripts with the same respect as real production code, too. You don’t have to write tests for them—if you want to, it’s a good idea, but scripts can be very hard to test—but do pay attention to making your scripts well-written, well-factored, and easy to understand. You’ll thank yourself later.

Once you have a bare-bones watch script, create a similarly bare-bones deploy script. At first, it just needs to run build in a pristine environment and integrate your code. There are many tools that will do this for you, typically under the name “continuous integration server” or “build server.” Be sure to get one that integrates after the build succeeds, not before.

When deploy is working, you’re ready to flesh out build. Write a do-nothing entry point for your application. Maybe it just says “Hello world.” Make build compile or lint it, then add dependency management for the compiler or linter. It can just check the version against a constant, to start with, or you can install a dependency management tool. Alternatively, you can vendor your dependencies.

Next, add a unit testing tool and a failing test. Be sure to add dependency management for the testing tool too. Make the build run the test, fail appropriately, and exit with an error code. Next, check that watch and deploy both handle failures correctly, then make the test pass.

Now you can add the rundev script. Make rundev compile (if needed) and run your do-nothing application, then make it recompile and rerun when the source files change. Refactor so that build, watch, and rundev don’t have duplicated file-watching or compilation code.

Ally
Continuous Deployment

Finally, flesh out deploy with a simple deployment step. Start by deploying to a staging server. The right way to do so depends on your system architecture, but you only have one production file, so you don’t need to do anything complicated. Just deploy that one file to one server. It can be as simple as using scp or rsync. Anything more complicated—crash handling, monitoring, provisioning—needs a story. (For example, “Site keeps working after crash.”) As your system grows, your automation will grow with it.

If you don’t deploy to a server, but instead distribute installation packages, make deploy build a simple distribution package. Start with a bare-bones package, such as a .zip file, that just contains your one production file. Fancier and more user-friendly installation can be scheduled with user stories.

You should be able to pull any commit and expect it to work the same for every developer.

From this point forward, update your automation with every story. When you add dependencies, don’t install them manually (unless you vendor them); add them to your dependency manager’s configuration and let it install them. That way, you know it will work for other people too. When a story first involves a database, update build, rundev, and deploy to automatically install, configure, and deploy it. Same for stories that involve additional services, servers, and so forth.

When written out in this way, automation sounds like a lot of work. But when you build your automation incrementally, you start simple and grow your automation along with the rest of your code. Each improvement is only a day or two of work, at most, and most of your time is focused on your production code.

Automating Legacy Code

You may not have the luxury of growing your automation alongside your code. Often, you’ll add automation to an existing codebase instead.

Start by create empty build, rundev, and deploy scripts. Don’t automate anything yet; just find the documentation for each of these tasks and copy it into the corresponding script. For example, the rundev script might say “1. Run `esoteric_command` 2. Load `https://obscure_web_page`,” and so forth. Wait for a keypress after each step.

Ally
Slack

Such simple automation shouldn’t take long, so you can create each script as part of your slack. When you create each one, the script becomes your new, version-controlled source of truth. Either remove the old documentation or change it to describe how to run the script.

Next, use your slack to gradually automate each step. Start with the low-hanging fruit and automate the easiest steps first, then focus on the steps that introduce the most friction. For a while, your scripts will have a mix of automation and step-by-step instructions. Keep going until the scripts are fully automated, then start looking for opportunities to further improve and simplify.

When build is fully automated, you’ll probably find that it’s too slow for one-second feedback (or even five-second feedback). Eventually, you’ll want to have a sophisticated incremental approach, but you can start by identifying small chunks of your codebase. Provide build targets that allow you to build and test each one in isolation. The more finely you chop up the chunks, the easier it will be to get below the five-second target.

Once a commonly-used build target is below ten seconds, it’s fast enough to be worth creating a watch script. Continue optimizing, using your slack to improve a bit at a time, until you get all the targets below five seconds. At some point, modify the build to automatically choose targets based on what’s changed.

Next, improve your deployment speed and reliability. This will probably require improving the tests, so it will take a while. As before, use your slack to improve a piece at a time. When a test fails randomly, make it deterministic. When you’re slowed down by a broad test, replace it with narrow tests. “Adding Tests to Existing Code” on p.XX explains what to do.

The code will never be perfect, but eventually, the parts you work with most frequently will be polished smooth. Continue using your slack to make improvements whenever you encounter friction.

Questions

How do we find time to automate?

Ally
The Planning Game
Done Done

The same way you find time for coding and testing: it’s simply part of the work to be done. During the planning game, when you size each story, include any automation changes the story needs.

Use your slack to make improvements when you encounter friction.

Similarly, use your slack to make improvements when you encounter friction. But remember that slack is for extra improvement. If a story requires automation changes, building the automation—and leaving the scripts you touched at least a bit better than you found them—is part of developing the story, not part of your slack. The story’s not done until the automation is too.

Who’s responsible for writing and maintaining the scripts?

Ally
Collective Code Ownership

They’re collectively owned by the whole team. In practice, team members with programming and operations skills take responsibility for them.

We have another team that’s responsible for build and deployment automation. What should we do?

Treat their automation in the same way you treat any third-party dependency. Encapsulate their tools behind scripts you control. That will give you the ability to customize as needed.

When does database migration happen?

It’s part of your deployment, but it may happen after the deployment is complete. See “Continuous Deployment” on p.XX for details.1

1XXX update with specific reference when CD practice done.

Prerequisites

Every team can work on getting one-second feedback. Some languages make fast feedback more difficult, but you can usually get meaningful feedback about the specific part of the system you’re currently working on, even if that means running a small subset of your tests. Fast feedback is so valuable, it’s worth taking the time to figure it out.

Your ability to run the software locally may depend on your organization’s priorities. In a multi-team environment, it’s easy to accidentally create a system that can’t be run locally. If that’s the case for you, you can still program your tests to run locally, but running the whole system manually might be out of your control.

Ally
Continuous Integration

Your operations team or organization may not want you to use continuous deployment. If so, you can create an integrate script instead of a deploy script. It’s the same thing, but without the deployment part: it runs build in a pristine environment, then integrates the code.

In some cases, your company may not allow you to install a continuous integration server. You don’t need a tool for continuous integration, though; just a spare development machine. See “Continuous Integration” on p.XX for details.2

2XXX Provide more specific reference after Continuous Integration practice is written.

Indicators

When your team has zero-friction development:

  • You spend your time developing, not struggling with tools, checklists, and dependency documentation.

  • You’re able to work in very small steps, which allows you to catch errors earlier and spend less time debugging.

  • Setting up a new development workstation is a simple matter of cloning the repository and running a script.

  • You’re able to integrate and deploy multiple times per day.

Alternatives and Experiments

Zero-friction development is an ideal that every team should strive for. The best way to do it depends on your situation, so feel free to experiment.

Some teams rely on their IDE, rather than scripting, to provide the automation they need. Others use large “kitchen-sink” tools with complicated configuration languages. I find that these approaches tend to break down as the needs of the team grow. They can be a convenient way to get started, but when you outgrow them, switching tends to be painful and difficult to do incrementally. Use caution when evaluating complicated tools that promise to solve all your automation needs.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Chapter: Collaboration (Introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Collaboration

In addition to the teamwork expected of any Agile team (see chapter “Teamwork”), Delivering teams also have high standards of technical excellence and collaboration. They’re expected to work together, as a team, to keep internal quality high and deliver their most important business priority.

These practices will help your team collaborate:

  • “Collective Code Ownership” on p.XX allows team members to improve each other's code.

  • “Pair Programming” on p.XX cross-pollinates ideas and helps the team maintain awareness of how everything fits together.

  • “Mob Programming” on p.XX gets the whole team working together.

  • “Ubiquitous Language” on p.XX helps team members understand each other.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Ubiquitous Language

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Ubiquitous Language

Audience
Programmers

Our whole team understands each other.

Try describing the business logic in your current system to a domain expert. Are you able to explain how the system works in terms they understand? Can you avoid programming jargon, such as the names of design patterns, frameworks, or coding styles? Is your domain expert able to identify potential problems in your business logic?

If not, you need a ubiquitous language. It’s a way of unifying the terms your team uses in conversation and code so that everybody can collaborate effectively.

The Domain Expertise Conundrum

One of the challenges of professional software development is that programmers usually aren’t experts in the software’s problem domain. For example, I’ve helped write software that controls factory robots; directs complex financial transactions; analyzes data from scientific instruments; and performs actuarial calculations. When I started working with those teams, I knew nothing about those things.

It’s a conundrum. The people who understand the problem domain—the domain experts—are rarely qualified to write software. The people who are qualified to write software—the programmers—don’t always understand the problem domain.

The challenge is communicating clearly and accurately.

Overcoming this challenge is, fundamentally, an issue of communication. Domain experts communicate their expertise to programmers, who in turn encode that knowledge in software. The challenge is communicating that information clearly and accurately.

Speak the Same Language

Programmers should speak the language of their domain experts, not the other way around. In turn, domain experts should tell programmers when the language they’re using is incorrect or confusing.

Imagine you’re creating a piece of software for typesetting musical scores. The publishing house you’re working for provides an XML description of the music, and you need to render it properly. This is a difficult task, filled with seemingly minor stylistic choices that are vitally important to your customers.

In this situation, you could focus on XML elements, parents, children, and attributes. You could talk about device contexts, bitmaps, and glyphs. If you did, your conversation might sound something like this:

Programmer: “We were wondering how we should render this clef element. For example, if the element’s first child is “G” and the second child is “2,” but the octave-change element is “-1,” which glyph should we use? Is it a treble clef?”

Domain expert (thinking, “I have no idea what they’re talking about. But if I admit it, they’ll respond with something even more confusing. I’d better fake it.”) “Um... sure, G, that’s treble. Good work.”

Instead, focus on domain terms rather than technical terms.

Programmer: “We were wondering how we should print this “G“ clef. It’s on the second line of the staff but one octave lower. Is that a treble clef?”

Domain expert (thinking, “An easy one. Good.”) “That’s often used for tenor parts in choral music. It’s a treble clef, yes, but because it’s an octave lower we use two symbols rather than one. Here, I’ll show you an example.”

The domain expert’s answer is different in the second example because they understand the question. The conversation in the first example would have led to a bug.

How to Create a Ubiquitous Language

Ally
Customer Examples

Ubiquitous language doesn’t come automatically. You have to work at it. When you talk to domain experts, listen for the terms they use. Ask questions about their domain, sketch diagrams that model what you hear, and ask for feedback. When you get into tricky details, ask for examples.

For example, imagine you’re having your first conversation with a domain expert about the music typesetting software:

Programmer: I took piano lessons as a kid, so I know the basics of reading music. But it’s been a while. Can you walk me through it from the beginning?

Domain expert: We typeset music for ensembles and orchestras here, so it’s not exactly the same as a piano score, but your background will help. To start with the basics, every score is divided into staves, each staff is divided into measures, and notes go into the measures.

Programmer: So the score is the fundamental thing we’re typesetting?

Domain expert: That’s right.

Programmer: Got it. (Draws a box and labels it “score.”) And then each score has staves. (Adds a box labelled “staff” and draws a line connecting it to “score.”) And each staff has measures. (Adds another box labelled “measure” and connects it to “staff.”) How many staffs can the score have?

Domain expert: It depends on the arrangement. Four, for a string quartet. A dozen or more for an orchestra.

Programmer: But at least one?

Domain expert: Well, I guess so. It wouldn’t make sense for a score to have zero staves. Each instrument gets a staff, or multiple, in the case of instruments with a lot of range, like pianos and organs.

Programmer: Okay, I’m starting to get lost. Do you have an example I can look at?

Domain expert: Sure. (Pulls out example.1) Here at the top, you can see the choir. There’s a staff for each part, which you can think of being the same as an instrument: soprano, alto, tenor, and bass. And then a grand staff for the harp, a grand staff and a regular staff for the organ, and so forth.

1See http://stevensametz.com/wordpress/wp-content/pdfs/sample/thumbs/Amo%201%20Munus%20-%20SATB,%20harp,%20organ,%20pc,%20orchestra%20Score-800-0.jpg for an example of orchestral sheet music.

Programmer: (Revising sketch on whiteboard.) So we start with the score, and the score has multiple instruments, and each instrument has one or more staffs, and the staff can either be a regular staff or a grand staff. And it looks like the instruments can be grouped together too.

Domain expert: Right, I should have mentioned that. The instruments can be grouped into sections. You know, string section, horn section?

Programmer: (Revising sketch again.) Got it. Staff has sections, sections have instruments, and then the rest.

Domain expert: (Looks at diagram.) This is a start, but there’s still a lot missing. We need a clef, key, and time signature...

The result of this conversation is more than just a whiteboard sketch. It can also form the basis for a domain model in your code. Not every program needs a domain model, but if your team’s software involves a complicated domain, a domain model is a powerful way to develop using your ubiquitous language.

You’re not going to literally program in the domain experts’ language, of course. You’ll still use a programming language. But you’ll create your modules, functions, classes, and methods so that they model the way your domain experts think. By reflecting in code how users think and speak about their work, you refine your knowledge, expose gaps that would otherwise result in bugs, and create a malleable system that is responsive to the changes your users will want.

To continue the example, a program to typeset a musical score based on XML input could be designed around XML concepts. A better approach, though, might be to design it around domain concepts, as shown in figure “XML and Domain-Centric Design”.

Two class diagrams. The one on the left is labelled “XML-centric design (simplified),” and it shows the relationships between an “Entity” and an “Attribute” class. The one on the right is labelled “Domain-centric design (simplified),” and it shows the relationships between domain-oriented classes, such as “Score,” “Measure,” “Staff,” and “Note.”

Figure 1. XML and Domain-Centric Design

Code doesn’t leave room for ambiguity. This need for rigorous formalization results in more conversations and clarifies obscure details. I often see situations in which programmers run into a sticky design problem, ask their domain expert a question, and this in turn causes the domain experts to question some of their assumptions.

Your ubiquitous language, therefore, is a living language. It’s only as good as its ability to reflect reality. As you clarify points with your domain experts, encode what you’ve learned in your domain model. As the domain model reveals ambiguities, bring them back to your domain experts for clarification.

Ally
Refactoring

As you go, be sure that your design and the language you and your domain experts share remain in sync. Refactor the code when your understanding of the domain changes. If you don’t, you’ll end up with a mismatch between your design and reality, which will lead to ugly kludges and bugs.

Questions

Should we avoid the use of technical terms altogether? Our business domain doesn’t mention anything about GUI widgets or a database.

It’s okay to use technical language in areas that are unrelated to the domain. For example, it’s probably best to call a database connection a “connection” and a UI button a “button.” However, you should typically encapsulate these technical details behind a domain-centric face.

How do we document our ubiquitous language?

Ideally, you encode your ubiquitous language in the actual design of your software using a domain model. If that’s not appropriate, you can document your model on a whiteboard (possibly a virtual whiteboard), shared document, or wiki page. Be careful, though: this sort of documentation requires a lot of attention to keep up to date.

Ally
Simple Design

The advantage of using code for documentation is that code can’t help but reflect what your software really does. With care, you can design your code to be self-documenting.

Different stakeholders use different terms for the same things. How can we reconcile this?

Your ubiquitous language doesn’t need to be literally ubiquitous. The important thing is to unify the language that your programmers, domain experts, and code use. Use the same terms as the domain experts that you work with directly. If you work with multiple domain experts, and they don’t agree—which happens more often than you might expect—ask them to work together to decide which approach you should use.

We program in English, but it’s not our first language, and our domain experts don’t use English. Should we translate their terms to English for consistency with the rest of our code?

It’s up to you. Words don’t always translate directly, so using your domain expert’s literal language is likely to result in fewer errors, especially if domain experts are able to overhear and contribute to programmers’ conversations. On the other hand, consistency might make it easier for others to work with your code in the future.

If you do decide to translate your domain experts’ terms to English (or another language), create a translation dictionary for the words you use, especially for words that don’t translate perfectly.

Prerequisites

Ally
Whole Team

If you don’t have any domain experts as part of your team, you may have trouble understanding the domain deeply enough to create a ubiquitous language. Attempting to do so is even more important in this situation, though. When you do have the opportunity to speak with a domain expert, the ubiquitous language will help you to discover misunderstandings more quickly.

On the other hand, some problems are so technical that they don’t involve non-programmer domain knowledge at all. Compilers and web servers are examples of this category. If you’re building this sort of software, the language of programming is the language of the domain.

Some teams have no experience creating domain models and domain-centric designs. If this is true of your team, proceed with caution. Domain-centric designs require a shift in thinking that can be difficult. See the “Further Reading” section to get started, and consider hiring a coach to help you learn.

Indicators

When you have a ubiquitous language that works:

  • You reduce miscommunication between customers and programmers.

  • You produce code that’s easier to understand, discuss, and modify.

  • When sharing a physical team room, domain experts overhear domain and implementation discussions. They join in to resolve questions and expose hidden assumptions.

Alternatives and Experiments

It’s always a good idea to speak the language of your domain experts, but domain-centric design isn’t always the best choice. Sometimes a technology-centric design is simpler and easier. This is most often the case when your domain rules aren’t very complicated. Be careful, though: domain rules are often more complicated than they first appear, and technology-centric designs tend to have defects and high maintenance costs when that’s true. See [Fowler 2002] for further discussion of this trade-off.

Further Reading

Domain-Driven Design: Tackling Complexity in the Heart of Software [Evans 2003] is the definitive guide to creating domain-centric designs. Chapter two, “Communication and the Use of Language,” was the inspiration for this practice.

Patterns of Enterprise Application Architecture [Fowler 2002] has a good discussion of the trade-offs between domain models and other architectural approaches.

XXX Consider Object Design (Wirfs-Brock and McKean)

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Mob Programming

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Mob Programming

Audience
Whole Team

We bring the insights of the whole team to bear.

In the early days of Extreme Programming, when pair programming first became popular, people used to mock it. “If pairing is good, why not triple!” they laughed. “Or just put the whole team in front of one computer!”

They were trying to put down XP, but the Agile way is to experiment, learn, and improve. Rather than assume something won’t work, we try an experiment. Some experiments work; some don’t. Either way, we share what we learn.

That’s what happened with mob programming. Woody Zuill had a group teaching technique he used for coding dojos. His team at Hunter Industries was in a bind. They decided to try Woody’s group technique on real world work, and put the whole team in front of one computer.

It worked, and worked well. Woody and the team shared what they learned. And now mob programming is used around the world.

In some parts of the world, the term “mob programming” has unpleasant connotations, so people call it “ensemble programming” instead. Woody’s original name for it was “Whole Team programming.” But, he says, “I have always said, I don’t care what it’s called. Learning to work well as a team is worthwhile and I invite people to call it what they will.”1

1Quoted from a conversation with Woody Zuill on Twitter: https://twitter.com/WoodyZuill/status/1365473397665193984

How to Mob

Ally
Pair Programming

Mob programming is a variant of pair programming. Like pairing, it has a driver, who codes, and navigators, who provides direction. Unlike pairing, the whole team is present. While one person drives, the rest of the team navigates.

To be clear, #MobProgramming is merely a tiny evolutionary step beyond pair programming. There are no rules except the general guideline of “Let’s figure out how to turn up our ability to collaborate well”.2

2Another excerpt of the Twitter conversation with Woody Zuill: https://twitter.com/WoodyZuill/status/1365475181213347848

Woody Zuill

All the brilliant minds, in the same place, at the same time, working on the same thing.

You’re welcome to try any approach to mobbing that you like. Experiment and find what works for you. The central idea, as Woody says, is “All the brilliant minds, in the same place, at the same time, working on the same thing.”

Ally
Whole Team

To get started, try Woody Zuill’s approach. It starts with the whole team: everybody is present and ready to participate. Some people, such as on-site customers, may not be focused on the programming specifically, but they’re available to answer questions and they’re working on the same stories the programmers are.

On top of that base, layer on Llewellyn Falco’s strong-style pairing: all ideas must pass through somebody else’s fingers. [Falco 2014] When it’s your turn to drive, your job is to act as a very clever input device. How clever, exactly, depends on your familiarity with the code and editor. In some cases, a navigator might say, “now handle error cases,” and the driver will test-drive four tests and the accompanying production code without further prompting. In other cases, a navigator might say, “now extract the method,” and the driver will have to ask what to type. Customize the level of detail to each driver’s experience with the code and tools.

Finally, add a timer. Seven minutes is a good starting point. When the timer goes off, the driver stops. Another person takes over and work continues right where the previous driver left off. Rotate through everybody who’s interested in programming.

Why Mobbing Works

Mob programming is “easy mode” for collaboration.

Mob programming works because it’s “easy mode” for collaboration.

So much of Agile centers around communication and collaboration. It’s the secret sauce that makes Agile more effective than other approaches. And mobbing makes a lot of the Agile collaboration practices irrelevant. They’re simply not needed when you mob.

Stand-up meetings? Gone. Collective code ownership? Automatic. Team room? A no brainer. Task planning? Still useful, but kind of unnecessary.

All the brilliant minds, in the same place, at the same time, working on the same thing. That’s the Agile ideal. Mobbing makes it easy.

When I first heard about mobbing, I poo-poo'd it. “I get the same benefits from having a cross-functional team, a team room, pairing, frequent pair switching, and good collaboration,” I said. And I was right. Mobbing doesn’t get you anything you don’t already get on a good team. But it’s so easy. Getting people to pair and collaborate well is hard. Mobbing? It’s practically automatic.

The Mobbing Station

If you have a physical team room, it’s pretty easy to set up a place for mobbing. You need a projector or big-screen TV (or several), tables for people to sit at, and a development workstation. Make sure everybody can sit comfortably, has access to laptops and whiteboards (for looking stuff up and discussing ideas), and has enough room to switch drivers easily. Some teams provide a mobbing station as well as pairing stations so people can switch back and forth as desired.

If your team is remote, set up a videoconference and have the driver share their screen. When it’s time to switch drivers, the previous driver pushes their code to a temporary branch and the next driver pulls it. A script such as the one found at https://mob.sh/ can help with this process. You might find that you need to set a longer timer—perhaps ten minutes instead of seven—to reduce the amount of switching needed.

Making Mobbing Work

Mobbing is fun and easy, but it can still be tiring to work with the whole team day-in and day-out. Here are some things to consider:

Team dynamics
Allies
Alignment
Safety
Team Dynamics

Pay attention to the interactions between team members and make sure everybody’s voices are being heard. Establish working agreements, make it safe for people to express disagreement and concerns, and pay attention to team dynamics.3 If there’s someone who tends to dominate, remind them to let others speak; if there’s someone who has trouble speaking up, ask for their opinion.

3XXX update after Safety and Team Dynamics written.

When you first start mobbing, it’s worth spending a few minutes at the end of each day for a very short retrospective. Focus on what worked well and how to do more of it. Woody Zuill calls this “turn up the good.”

Energized work
Ally
Energized Work

Mobbing isn’t supposed to wear you out, but it can be overwhelming to be constantly surrounded by the whole team. Take care of yourself. You don’t need to be “on” at every moment.

One of the advantages of mobbing is that it’s not dependent on any one person. If you need a coffee break, or just want to clear your head, step away. Similarly, if you need to check your email or make a phone call, you can do that. The mob will continue on without you.

You don’t have to align your work schedules, either. People can drop in and out as needed.

Research
Ally
Spike Solutions

All changes to the production code go through the driver, but you can still use your computer when you aren’t driving. If you need to look up an API call, or have a side discussion about a design idea at the whiteboard, or create a spike solution, you can do that.

Strict navigator role

When you start mobbing, your team might have so many people shouting ideas that the driver has trouble understanding what to do. In this case, rather than having the whole team act as navigators, you can appoint one person to be navigator. This role rotates just like the driver role does. (I like to have the driver become the next navigator.) Their job is to condense the ideas of the mob into specific directions for the navigator. The driver only has to listen to the navigator, not the whole mob.

Non-programmers

Everybody in the mob can be a driver, even people who don’t know how to program. This can be an exciting opportunity for non-programmers to develop new skills. They may not become experts, but they’ll learn enough to contribute, and learning to drive could improve their ability to collaborate with programmers.

Remember to guide your driver at the level that they’re capable of following. For non-programmers, this may require providing direction at the level of specific keyboard shortcuts, menu items, and mouse clicks, at first.

But nobody is required to be a driver. Some people on the team may find that their time is better spent helping the mob in other ways. A tester and a domain expert might have a side conversation about customer examples related to the current story. A product manager may step out to conduct an interview with an important stakeholder. An interaction designer may work on user personas.

As with anything else, experiment with varying people’s level of involvement to find what works best for your team. But start by trying more involvement, rather than less. People often underestimate the power of working as a team. That conversation about customer examples, or stakeholder interview, or user persona work could be something that the mob learns from doing together.

Mini-mobs and part-time mobs

You don’t have to choose between pairing or mobbing. (Although I do recommend doing one or the other for all code you have to maintain.) You can mob part time and pair the rest of the time. Or you can form a “mini-mob” of three or four people while the rest of the team pairs.

Allies
Task Planning
Stand-Up Meetings

If you don’t mob full-time, be sure to keep other team coordination mechanisms, such as the task board and stand-up meetings, at least to start. The mobbing sessions may allow you to keep in sync without them, but make sure that’s true before removing them.

Questions

Is mobbing really more effective than working alone or in pairs?

There’s too many variables to say for sure. In my experience, pairing is more effective than working alone. Is mobbing even more effective than pairing? For teams with a good team room and great collaboration, maybe not. For other teams, it probably is. Try it and find out.

We’re having trouble remembering to switch drivers. What should we do?

If people are ignoring your timer, try using a tool such as Mobster (available at http://mobster.cc/). When the time is up, it blanks the screen so the driver has to stop.

Prerequisites

Mobbing requires permission from the team and management. Other than that, the only requirement is a comfortable work environment and appropriate mobbing setup.

Indicators

When your team mobs well:

  • The whole team directs their entire effort towards one story at a time, finishing work with minimal delays and wait time.

  • The team collaborates well and enjoys working together.

  • Internal quality improves.

  • When a tough problem arises, the mob solves it while the driver continues moving forward.

  • Decisions are made quickly and effectively.

Alternatives and Experiments

“All the brilliant minds, in the same place, at the same time, working on the same thing.” That’s the core idea of mob programming. Beyond that, the details are up to you. Start with the basic structure described here, then think about something to improve every day.

Ally
Pair Programming
Task Planning
Stand-Up Meetings

If mobbing isn’t a good fit, the best alternative is pair programming. Pairing doesn’t have the same automatic collaboration that mobbing does, though, so you’ll need to put more effort into collective ownership, task planning, and stand-up meetings.

Further Reading

XXX To consider:

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Pair Programming

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Pair Programming

Audience
Developers, Whole Team

We help each other succeed.

Do you want somebody to watch over your shoulder all day? Do you want to waste half your time sitting in sullen silence watching somebody else code?

Of course not. Nobody does—especially not people who pair program.

Pair programming is one of the most controversial Agile ideas. Two people working at the same computer? It’s weird. It’s also extremely powerful and, once you get used to it, tons of fun. Most programmers I know who tried pairing for a month found that they preferred it to programming alone.

Ally
Collective Code Ownership

More importantly, pair programming is one of the most effective ways to achieve collective code ownership and truly collaborate on code as a team.

Why Pair?

There’s more to pairing than sharing knowledge. Pairing also improves the quality of your results. That’s because pair programming doubles your brainpower.

When you pair, one person is the driver. Their job is to code. The other person is the navigator. Their job is to think. As navigator, sometimes you think about what the driver is typing. (Don’t rush to point out missing semicolons, though. That’s annoying.) Sometimes you think about what comes next. Sometimes you think about how your work best fits into the overall design.

This arrangement leaves the driver free to work on the tactical challenges of creating rigorous, syntactically correct code without worrying about the big picture, and it gives the navigator the opportunity to consider strategic issues without being distracted by the details of coding. Together, the driver and navigator produce higher-quality work, more quickly, than either could produce on their own.1

1One study found that pairing takes about 15 percent more effort than one individual working alone, but produces results more quickly and with 15 percent fewer defects. [Cockburn and Williams 2001] Every team is different, so take these results with a grain of salt.

Pairing also reinforces good programming skills. Delivering practices take a lot of self-discipline. When pairing, you’ll have positive peer pressure to do the things that need to be done. You’ll also spread coding knowledge and tips throughout the team.

Surprisingly, you’ll also spend more time in flow—that highly productive state in which you’re totally focused on the code. It’s a different kind of flow than when you’re working alone, but it’s far more resilient to interruptions. To start with, you’ll discover that your office mates are far less likely to interrupt you when you’re working with someone. When they do, one member of the pair will handle the interruption while the other continues working. Further, you’ll find that background noise is less distracting: your conversation with your pairing partner will keep you focused.

If that isn’t enough, pairing really is a lot of fun. The added brainpower will help you get past roadblocks more easily. For the most part, you’ll be collaborating with smart, like-minded people. Plus, if your wrists get sore from typing, you can hand off the keyboard to your partner and continue to be productive.

Pairing Stations

To enjoy pair programming, a good workspace is essential, whether your team is in-person or remote. For in-person teams, make sure you have plenty of room for both people to sit side by side. Typical cubicles, with a monitor located in a corner, won’t work. They’re uncomfortable and require one person to sit behind the other, adding psychological as well as physical barriers to what’s meant to be peer collaboration.

You don’t need fancy furniture to make a good in-person pairing station. A simple table will do. It should be six feet long, so that two people can sit comfortably side by side, and at least four feet deep. Each table needs a high-powered development workstation. Plug in two keyboards and mice so each person can have a set. If people have a preferred mouse and keyboard, they can bring it with them. Make sure the USB ports are easily accessible in this case.

Splurge on large monitors so both people can see clearly. Be sure to respect differences in people’s vision needs, particularly with regards to font sizes and colors. Some teams set up three monitors, with the two outer monitors mirrored, so each person can see the code on a monitor in front of them, while using the middle display for additional material. If you do this, try installing a utility that makes the mouse wrap around the edges of your desktop. It will let both programmers reach the center screen easily.

If your team is remote, you’ll need a collaborative code editor and videoconference. Make sure you have multiple screens, so you can see each other and the code at the same time.

There are a variety of IDE add-ins and standalone tools for collaborative editing, such as Code Together, Tuple, Floobits, and Visual Studio’s Live Share. You can also share your screen in your videoconferencing tool, but a collaborative code editor will work better because it allows you to switch drivers seamlessly. If you have to use screen-sharing, though, you can hand off control by pushing the code to a temporary work-in-progress branch. Write a little script to automate the process.

Jeff Langr has a good rundown of remote code collaboration options in [Langr 2020].

How to Pair

I recommend pairing on all production code. Teams who pair frequently, but not exclusively, say that they find more defects in solo code. That matches pair programming studies, such as [Cockburn and Williams 2001], that find that pairs produce higher quality code. A good rule of thumb is to pair on anything that you need to maintain, which includes tests and automation.

When you start working on a task, ask another programmer to work with you. If someone else asks for help, make yourself available. Managers should never assign partners: pairs are fluid, forming naturally and shifting throughout the day. Over the course of the week, pair with every developer on the team. This will improve team cohesion and spread skills and knowledge throughout the team.

Get a fresh perspective by switching partners.

When you need a fresh perspective, switch partners. I usually switch when I’m feeling frustrated or stuck. Have one person stay on task and bring the new partner up to speed. Often, even explaining a problem to someone new will help you resolve it.

It’s a good idea to switch partners several times per day even if you don’t feel stuck. This will help keep everyone informed and moving quickly. I switch whenever I finish a task. If I’m working on a big task, I switch within four hours.

Some teams switch partners at strictly defined intervals. [Belshee 2005] reports interesting results from switching every 90 minutes. While this could be a great way to get in the habit of switching pairs, make sure everybody is willing to try it.

When you sit down to pair, make sure you’re physically comfortable. If you’re colocated, position your chairs side by side, allowing for each other’s personal space, and make sure the monitor is clearly visible. When you’re driving, place the keyboard directly in front of you. Keep an eye out for this one—for some reason, people new to pairing tend to contort themselves to reach the keyboard and mouse rather than moving them closer.

Expect to feel clumsy and fumble-fingered, at first, when it’s your turn to drive. You may feel that your navigator sees ideas and problems much more quickly than you do. They do—navigators have more time to think than drivers do. The situation will be reversed when you navigate. Pairing will feel natural in time.

Ally
Test-Driven Development

Pairs produce code through conversation. As you drive or navigate, think out loud. Take small steps—test-driven development works well—and talk about your assumptions, short-term goals, general direction, and any relevant history of the feature or project. If you’re confused about something, ask questions. The discussion may enlighten your partner as much as you.

When a pair goes dark—talks less, lowers their voices, or doesn’t switch off with other pairs—it’s often a sign of technical difficulty.

As you pair, switch the driver and navigator roles frequently—at least every half hour, and possibly every few minutes. If you’re navigating and find yourself telling the driver which keys to press, ask for the keyboard. If you’re driving and need a break, pass the keyboard off to your navigator.

Ally
Energized Work

Expect to feel tired at the end of the day. Pairs typically feel that they have worked harder and accomplished more together than when working alone. Practice energized work to maintain your ability to pair every day.

Effective Navigating

When navigating, you may feel like you want to step in and take the keyboard away from your partner. Be patient; your driver will often communicate an idea with both words and code. They’ll make typos and little mistakes—give them time to correct themself. Use your extra time to think about the bigger picture. What other tests do you need to write? How does this code fit into the rest of the system? Is there duplication you want to remove? Can the code be more clear? Can the overall design be better? Is there friction that should be polished away?

Pay attention to your driver’s needs, too. Somebody’s who’s unfamiliar with the IDE or codebase may need specific guidance. But resist the urge to micromanage. Give them room to figure out things on their own.

As navigator, your role is to help your driver be more productive. Think about what’s going to happen next and be prepared with suggestions. When I’m navigating, I like to keep an index card in front of me. Rather than interrupting the driver when I think of an something, I write my ideas on the index card and wait for a break in the action to bring them up. At the end of the pairing session, I tear up the card and throw it away.

Ally
Spike Solutions

Similarly, when a question arises, take a moment to look up the answer while the driver continues to work. Some teams keep spare laptops on hand for this purpose. If you need more than a few minutes, pause coding to research the solution together. Sometimes the best way to do this is to split up, pursue parallel lines of inquiry, and come back together to share what you’ve learned. Spike solutions are a particularly powerful approach.

Teaching Through Pairing

Pair programming works best when it’s a peer collaboration, but sometimes you’ll be in a situation where you know the code and your partner doesn’t.

The best developers help everyone work quickly and well.

When this happens, remember to be patient. Teaching your pair partner how the code works slows you down, but the goal isn’t to maximize your performance... it’s to maximize the team’s performance. A good developer works quickly and well, but the best developers help everyone do so.

When you use pairing to teach someone about the code, start by letting them drive. That will allow them to control the pace. As you guide them, refrain from telling them exactly what to do. Instead, provide the big-picture direction—maybe even start with a whiteboard diagram—and give them space to figure out the details.

For example, when making changes to a service, don’t say, “We need to change SuperMailClient. Click src... now click infrastructure... now click rest...” Instead, provide context and direction: “Our task is to replace our transactional mail vendor, SuperMail, with BetterMail. They both provide REST APIs, so all we need to do is change our SuperMail wrapper to use BetterMail instead. (Sketches the project structure on the whiteboard.) All our REST clients are in the infrastructure/rest folder and each service has its own wrapper.” Then let your partner navigate through the project files and find the file to work on themselves.

Once the person you’re teaching can find their way around, switch roles. Ask them to navigate and tell you what needs to be done next. Be careful, though: when you’re driving, it’s tempting to rush ahead and just do what you know needs to be done. For it to work as a teaching technique, you have to suppress that desire and let your partner set the pace.

Challenges

Pairing can feel awkward or unpleasant at first. These feelings are natural and typically go away after a month or two. Here are some common challenges and how to resolve them:

Comfort

It bears repeating: pairing is no fun if you’re uncomfortable. When you sit down to pair, adjust your position and equipment so you can sit comfortably. Clear debris off the desk and make sure there’s room for your legs, feet, and knees. Check in with your partner about font sizes and monitor position. If you’re pairing remotely, take time before you begin to make sure all your tooling is set up and frictionless.

Some people (like me) need a lot of personal space. Others like to get up close and personal. When you start to pair, discuss your personal space needs and ask about your partner’s.

Similarly, while it goes without saying that personal hygiene is essential, remember that strong flavors such as coffee, garlic, onions, and spicy foods can lead to foul breath.

Introversion and social anxiety

Introverts often worry that pairing won’t work for them, but—as an introvert myself—I haven’t found that to be true in practice. Although pairing can tiring, it’s also very focused on ideas and results. There’s no need to engage in small talk, and you’re typically working with people who you know well and respect. It’s a very productive, very cerebral collaboration, and that can be a lot of fun. Most introverts I’ve met who have tried pairing have liked it, once they got past the initial learning curve.

Ally
Alignment

Of course, people don’t divide neatly into predefined personality trait boxes. Pairing—and Agile in general—can be difficult for people with social anxiety. If you think pairing might be difficult for you or someone on your team, talk about ways to make pairing more comfortable, or if there are other ways your team can achieve collective code ownership. The alignment session is a good time for this conversation.

Mismatched skill levels

Although pairing works best as a peer collaboration, sometimes people with different skill sets will work together. In this situation, it’s important to restore the peer balance. Highlight the skills that each person is bringing to the table. Even if one person needs to teach the other about the code, treat it as a lack of knowledge that’s easily rectified, not a lack of ability on the part of the learner, or sign of superiority on the part of the teacher.

Communication style

New drivers sometimes have difficulty involving their partners; they can take over the keyboard and shut down communication. To practice communicating and switching roles while pairing, consider ping-pong pairing. In this exercise, one person writes a test. The other person makes it pass and writes a new test. Then the first person makes it pass and repeats the process by writing another test.

Another approach to try is strong-style pairing. In strong-style pairing, invented by Llewellyn Falco, all ideas must pass through the other person’s fingers. [Falco 2014] So if you come up with an idea, you have to pass the keyboard to the other person and tell them how to implement it. Then when they come up with an idea, they pass the keyboard back to you and tell you what to do. Even if this isn’t something you want to do all the time, it’s a great way to practice communicating with your partner.

Ally
Safety

The flip side of too little communication is too much communication—or rather, too much blunt communication. Frank criticism of code and design is valuable, but it may be difficult to appreciate at first. Different people have different thresholds, so pay attention to how your partner receives your comments. Try transforming declarations (such as “This method is too long”) into questions or suggestions (“Could we make this method shorter?” or “Should we extract this code block into a new method?”). Adopt an attitude of collaborative problem solving.2

2XXX Double-check this after Safety practice is done

Tools and keybindings
Ally
Alignment

Even if you don’t fall victim to the endless vi vs. emacs editor war, you may find your coworkers’ tool preferences annoying. Try to standardize on a particular toolset. Some teams even create a standard image and check it into version control. When you discuss working agreements during your alignment discussion, discuss these issues as well.

Keyboards and mice can be another source of contention. If they are, you don’t have to standardize. People with strong input device preferences can take their devices with them when they switch pairing stations. Just make sure they have easily-accessible USB ports.

Questions

Isn’t it wasteful to have two people do the work of one?

In pair programming, two people aren’t really doing the work of one. Although only one keyboard is in use at a time, there’s more to programming than typing. In pair programming, one person is programming and the other is thinking ahead, anticipating problems, and strategizing.

How can I convince my team or organization to try pair programming?

Ask permission to try it as an experiment. Set aside a month in which everyone pairs on all production code. Be sure to keep going for the entire month, as pair programming may be uncomfortable for the first few weeks.

Don’t just ask permission of management; get the consent of your fellow team members, too. They don’t have to love the idea, but do make sure they’re not opposed to it.

Do we really have to pair program all the time?

This is a decision that your whole team should make together. Before you decide, try pairing on all production code (and everything else you need to maintain) for a month. You may enjoy it more than you expect.

Even if you decide that all production code needs to be paired, you will still produce code that you don’t need to maintain. Spike solutions are one example. These often benefit from working independently.

If you’re bored while pairing, it’s an indication of a design flaw.

Some production tasks are so repetitive that they don’t require the extra brainpower a pair provides. Before abandoning pairing, however, consider why your design requires so much repetition. It’s a common indication of a design flaw. Use the navigator’s extra time to think about design improvements and consider discussing it with your whole team.

How can I concentrate with someone talking to me?

When you navigate, you shouldn’t have too much trouble staying several steps ahead of your driver. If you do have trouble, ask your driver to think out loud so you can understand their thought process, or ask to drive so you can control the pace.

As driver, you may sometimes find that you’re having trouble solving a problem. Let your navigator know—they may have a suggestion that will help you through the roadblock. At other times, you may just need a few moments of silence to think through the problem. It’s okay to say so.

Allies
Test-Driven Development
Spike Solutions

If you find yourself in this situation a lot, you may be taking steps that are too large. Use test-driven development and take very small steps. Rely on your navigator to keep track of what you still need to do (tell them if you have an idea; they’ll write it down) and focus only on the few lines of code needed to make the next test pass.

If you are working with a technology you don’t completely understand, consider taking a few minutes to work on a spike solution. You and your partner can work on this together or separately.

What if we have an odd number of programmers?

A programmer flying solo can do productive tasks that don’t involve production code. They can research new technologies or learn more about a technology the team is using. They can pair with a customer or tester to review recent changes, polish the application, or do exploratory testing. They can take care of administrative tasks for the team, such as responding to team emails.

Ally
Zero Friction

Alternatively, a solo programmer may wish to improve the team’s capacity. They can research solutions to friction the team is experiencing, such as slow builds, flaky tests, or unreliable deployment pipelines. They can review the overall design—either to improve their own understanding or to come up with ideas for improving problem areas. If a large refactoring is partially complete, the team may wish to authorize a conscientious programmer to finish those refactorings.

If you run out of useful solo tasks, you can relax the “no production code” rule or use mob programming to form a “mini-mob” of three people.

Prerequisites

Pairing requires a comfortable work environment. Most offices and cubicles just aren’t set up that way. Before trying pairing full-time, adjust your physical space. If your team is remote, get your tooling in place.

Make sure everyone wants to participate before you try pairing. Pairing is a big change to programmers’ work styles and you may encounter resistance. I usually work around this by asking people to try it for a month or two, then decide. If that doesn’t work, you can try pairing part-time, or with just the people who are interested, although I find that pairing works best when the whole team does it full-time.

Ally
Mob Programming

Mob programming tends to be less intimidating than pairing. If people don’t want to try pairing, see if they’d like to try mobbing instead.

Indicators

When your team pairs well:

  • You’re focused and engaged throughout the day.

  • You enjoy the camaraderie of working with your teammates.

  • At the end of the day, you feel tired and satisfied.

  • For small interruptions, one person deals with the problem while the other continues working. Afterwards, they slide back into the flow of work immediately.

  • Internal quality improves.

  • Knowledge and coding tips travel quickly through the team, raising everyone’s level of competence.

  • New team members integrate into the team quickly and easily.

Alternatives and Experiments

Pairing is a very powerful tool. I’m not aware of any other technique, other than mobbing, that’s as effective. Give pairing (or mobbing) a real try before experimenting with alternatives.

When you look at alternatives, don’t make the mistake of thinking that pairing is just a fancy type of code review. To truly replace pairing, you need to replace all these benefits:

Ally
Collective Code Ownership

Code quality. Because pairing brings so many perspectives to the code, and results in so much conversation about the code, it reduces defects and improves design quality. The frequent pair switching shares knowledge amongst team members, which enhances collective code ownership. By having people work together, it helps people focus, supports self-discipline, and reduces distractions. It does all this without sacrificing productivity.

Formal code reviews can also reduce defects, improve quality, and support self-discipline. In a sense, pairing is just continuous code review. Code reviews don’t share knowledge as thoroughly as pairing, though, so if you’re using collective code ownership, you probably need to supplement code reviews with additional design discussions.

Flow. Pairing’s benefits to flow are more subtle. Because it focuses two people on the same problem, pairing is sort of like having a backup brain. If one person gets distracted, the other person can “reboot” their attention and get them back on track quickly. It’s also easier to ignore the ever-present distractions provided by smartphones, email, instant messaging, and the other demands on our attention. In an environment without pairing, you’ll need another way to help people stay focused.

Collaboration. Pairing’s resilience to distractions makes intra-team collaboration easier. Ideally, in a team, when one person gets stuck on a question that another team member can answer, you want them to ask for help rather than spinning their wheels. If you’re pairing, there’s very little cost to answering a question, because your pairing partner keeps working. It makes sense to ask for help any time you need help.

If you aren’t pairing, interruptions are much more costly. According to [DeMarco and Lister 2013],3 it takes a programmer 15 minutes or more to get back into flow after an interruption. The calculus of interruptions changes: do you ask a question and cost somebody on the team at least fifteen minutes of work? Or do you continue to struggle and hope you get the right answer? Personally, I'd keep the interruptions, because it benefits team cohesion to have a culture of helping each other. But that can be frustrating for developers.

3XXX check new edition for this reference; add page number

Noise cancellation with situational awareness. Pair programming has another benefit that’s even less obvious. In a physical team room, pairing creates a low buzz of conversation. You might expect this to be distracting, but it actually recedes into the background as your brain focuses on your interaction with your partner. But the background conversation still enhances your situational awareness. It’s the cocktail-party effect: When somebody says something important to you, your subconscious picks it out of the background and brings it to your conscious attention.

Allies
Team Room
Informative Workspace

In contrast, for teams that don’t pair, side conversations are distracting and can make it hard to concentrate. In that situation, independent offices or cubicles can be better. But now you won’t be able to take advantage of the team room: you won’t be able to see what others are doing as easily and you won’t have the situational awareness provided by an informative workspace.

You could keep the team room and have everyone wear noise-cancelling headphones instead, or encourage people to take side conversations to another room. This will bring back some of your situational awareness, but you won’t get the advantages of the cocktail-party effect.

In other words, pairing has a lot of unobvious benefits that reinforce other agile practices. Although it’s definitely weird, and can be a lot to ask, it’s worth putting in the effort to give it a real try. Don’t just dismiss it out of hand. If pairing isn’t a good fit, try mobbing instead.

Further Reading

Pair Programming Illuminated [Williams 2002] discusses pair programming in depth.

“The Costs and Benefits of Pair Programming” [Cockburn and Williams 2001] reports on Laurie Williams‘ initial study of pair programming.

“Promiscuous Pairing and Beginner‘s Mind: Embrace Inexperience” [Belshee 2005] is an intriguing look at the benefits of switching pairs at strict intervals.

“Adventures in Promiscuous Pairing: Seeking Beginner‘s Mind” [Lacey 2006] explores the costs and challenges of promiscuous pairing. It‘s a must-read if you plan to try Belshee‘s approach.

XXX Peer Reviews in Software: A Practical Guide (Wiegers)?

XXX https://martinfowler.com/articles/on-pair-programming.html

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Collective Code Ownership

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Collective Code Ownership

Audience
Developers

We are all responsible for all our code.

Agile teams collectively own their work, as described in “Key Idea: Collective Ownership” on p.XX. But how does that apply to code?

Collective code ownership means the team shares responsibility for their code. Rather than assigning modules, classes, or stories to specific individuals, the team owns it all. It’s the right and responsibility to make improvements to any aspect of your team’s code at any time.

Fix problems no matter where you find them.

In fact, improved code quality is one of the hidden benefits of collective code ownership. Collective ownership allows—no, expects—everyone to fix the problems they find. If you encounter duplication, unclear names, poor automation, or even poorly designed code, it doesn’t matter who wrote it. It’s your code. Fix it!

Making Collective Ownership Work

Allies
Mob Programming
Task Planning
Pair Programming
Stand-Up Meetings

Collective code ownership requires careful coordination around design and planning. If you’re using mob programming, that coordination comes for free. Otherwise, your task planning meeting is a good time to start the discussion. When you discuss how to break down tasks, talk about your design. Write tasks in terms of how your design will change: “Add endpoint to UserReportController.” “Update ContactRecord.” “Add columns to GdprConsent database table.”

When you’re ready for a new task, you can pick up any task from the planning board. In many cases, you’ll just take the next one off the list, but it’s okay to jump ahead a bit to choose a task that you’re interested in or particularly well-suited for.

Ally
Done Done

In an ideal world, your team will swarm each story: everyone will choose tasks for the same story and focus on getting the story “done done” before moving on to the next. This minimizes work in progress (see “Key Idea: Minimize Work in Progress” on p.XX) and exposes risks early.

Don’t jump ahead to another story just because you don’t know how to coordinate.

In practice, it’s okay for people to jump ahead to the next story when the current one is close to completion. Just be careful: when you’re new to collective ownership, it’s going to be easy to accidentally end up with everyone taking de-facto ownership of separate stories rather than truly working together. Don’t jump ahead to another story just because you don’t know how to coordinate.

When you pick up a task that’s closely related to another person or pair’s, have a quick discussion with them. Perhaps they’ve grabbed a front-end task and you’ve grabbed the corresponding back-end task. Take a moment to get on the same page about the API. One or both of you can stub in the API with do-nothing code, then one of you can be responsible for filling it in. Whoever commits their code second is responsible for double-checking that it works together.

As you work on the code, you’ll come up with new ideas that affect other people’s work. Pairing will help ideas will organically spread around the team. You can also use the daily standup to summarize new ideas. If you’re not using pairing (or mobbing), you might need to add a daily design review.

Some ideas warrant immediate discussion. In a physical team room, just stand up and announce what you want to talk about. People will come join you. In a remote team room, announce the topic in your group chat, and invite people to join you in a videoconference. “Drop in and Drop Out” on p.XX has more details.

Egoless Programming

Collective code ownership requires letting go of a little bit of ego. Rather than taking pride in your code, take pride in your team’s code. Rather than complaining when someone modifies code you wrote, enjoy how the code improves when you’re not working on it. Rather than pushing your personal design vision, discuss design possibilities with your teammates and agree on a shared solution.

Collective ownership also requires a joint commitment from team members to produce good code. When you see a problem, fix it. When writing new code, don’t assume somebody else will fix your mistakes. Write the best code you can.

On the other hand, collective ownership also means you don’t have to be perfect. If your code works and you’re not sure how to make it better, don’t hesitate to let it go. Someone else will improve it later, if and when the code needs it.

Always leave the code a little better than you found it.

Conversely, when you’re working in “someone else’s” code (but it’s not someone else’s—it’s yours!), avoid the temptation to make personal judgements about their code. But do always leave the code a little better than you found it. If you see an opportunity for improvements, don’t be shy. You don’t need to ask permission. If you’re unsure, go ahead and get a second opinion. But if you know it’s a good idea, do it!

Collaborating Without Conflict

In the beginning, collective code ownership is an opportunity for conflict. All the little annoyances about your colleagues’ work styles are double-underlined with a bright purple highlighter. This is a good thing—really!—because it gives you a chance to align your style. But it can be frustrating at first.

Allies
Alignment
Mob Programming
Retrospectives
Stand-Up Meetings
Task Planning
Pair Programming
Continuous Integration

To help the process go more smoothly, decide on important coding, design, and architectural standards when you discuss alignment. When you first adopt collective code ownership, try mob programming for a week or two so you can hash out important differences. Bring up areas of disagreement in your retrospectives and come up with plans for resolving them.

If you don’t use mob programming, you’ll need a way to avoid stepping on each other's toes during day-to-day work. Daily stand-up meetings are a good way to coordinate, so long as they’re kept brief and focused. The task planning board will help maintain your situational awareness, especially if it’s visible from where you sit.

Pair programming will help you keep up with everyone’s changes. Your partner will often be aware of changes you aren’t, and vice versa. When they aren’t, pair programming makes it easier to ask another pair for help. People who are pairing can take brief interruptions without disturbing their progress—one person just keeps going while their partner deals with the interruption.

In fact, make a point of encouraging people to ask for help when they’re stuck. There’s no point in somebody banging their head against a wall for 30 minutes if somebody else on the team already knows the answer.

Finally, continuous integration will prevent painful merge conflicts and keep everyone’s code in sync.

Working with Unfamiliar Code

If you’re working on a project that has knowledge silos—pockets of code that only one or two people understand—then collective code ownership might seem daunting. How can you take ownership of code that you don’t understand?

Mob programming may be your best choice, at least to start. It will help the whole team share their knowledge with each other quickly. If that’s not to your taste, pair programming also works.

To use pairing to expand your knowledge, volunteer to work on tasks that you don’t understand. Ask somebody who knows that part of the system well to pair with you. While pairing, resist the temptation to sit back and watch them program. Instead, take the keyboard and ask them to guide you. Use your control of the keyboard to control the pace: ask questions and make sure you understand what you’re being asked to do. “Teaching Through Pairing” on p.XX has more.

If nobody understands the code, exercise your inference skills. You don’t need to know exactly what’s happening in every line. In a well-designed system, all you need to know is what each package, namespace, or folder is responsible for. Then you can infer high-level class responsibilities and method behaviors from their names.

Allies
Test-Driven Development
Refactoring

Well-written tests also act as documentation and a safety net. Skim the test names to get an idea of what the corresponding production code is responsible for. If you’re not sure how something works, change it anyway and see what the tests say. An effective test suite will tell you when your assumptions are wrong.

As you learn, refactor the code to reflect your improved understanding. Fix confusing names and extract variables and functions. This will codify your understanding and help the next person, too. Arlo Belshee’s “Naming as a Process” technique [Belshee 2019] is a nice formalization of this approach.

If you’re working with code that nobody understands, is poorly designed, and doesn’t have any tests, all is not lost. You can use characterization tests to refactor safely. See “Adding Tests to Existing Code” on p.XX for details.

Benefits to Programmers

Of course nobody can understand it... it’s job security!

Old programmer joke

Collective ownership makes a lot of sense for an organization. It reduces risk, improves cycle time, and improves quality by bringing more minds to bear on the code. But does it make sense for programmers? Won’t collective ownership make it more difficult for your contributions to be recognized?

Honestly... it could. As discussed in “Change Harmful HR Policies” on p.XX, Agile requires that your organization recognize and value team contributions more than individual heroics. If that’s not true for your organization, collective code ownership might not be a good fit.

Even if your organization values teamwork, it’s not easy to let a great piece of code out of your hands. It can be difficult to subsume the desire to take credit for a particularly clever or elegant solution.

But it is good for you as a programmer. Why? The whole codebase is yours—not just to modify, but to support and improve. You get to expand your technical skills. When teaching people about your area of expertise, you get to practice your mentoring skills, too.

You also don’t have to carry the maintenance burden for every piece of code you write, either. The whole team has your back. Over time, they’ll know your code as well as you do, and you’ll be able to go on vacation without worrying about called with questions or emergencies.

It’s a little scary at first to come into work and not know exactly which part of the system you’ll work on today, but it’s also freeing. You no longer have long subprojects lingering overnight or over the weekend. You get variety and challenge and change. Try it—you’ll like it.

Questions

We have a really good front-end developer / database programmer / scalability guru. Why not take advantage of their skills?

Please do! Collective code ownership means everybody contributes to every part of the system, but you’ll still need experts to lead the way.

How can everyone learn the entire codebase?

People naturally gravitate to one part of the system or another. They become expert in particular areas. Everyone gains a general understanding of the overall codebase, but they don’t know every detail.

Allies
Simple Design
Test-Driven Development
Pair Programming
Mob Programming

Several practices enable this approach to work. Simple design and its focus on code clarity make it easier to understand unfamiliar code. Tests act as a safety net and documentation. Pairing and mobbing allow you to work with people who have the details you don’t.

Different programmers on our team are responsible for different products. Should the team collectively own all these products?

If you’ve combined programmers onto a single team, then yes, the whole team should take responsibility for all their code. If you have multiple separate teams, then they may or may not share ownership across teams, depending on how you approach scaling. See chapter “Scaling Agility” for details.

Prerequisites

Allies
Alignment
Safety
Whole Team
Team Room
Task Planning
Stand-Up Meetings
Pair Programming
Mob Programming
Continuous Integration
Simple Design
Test-Driven Development

Collective code ownership is socially difficult. Some organizations have trouble letting go of individual rewards and accountability. Some programmers have trouble letting go of taking individual credit, or refuse to use certain programming languages. For these reasons, it’s important to talk with managers and team members about collective code ownership before trying it. These concerns should be part of your initial discussions about whether or not to try Agile (see chapter “Invest in Change”) and brought up again during your alignment session.

Safety is critical. If team members don’t feel safe expressing and receiving criticism, or if they fear being attacked when they raise ideas or concerns, they won’t be able to share ownership of code. Instead, little fiefdoms will pop up. “Ooh, don’t change that code yet. You should talk to Antony first to make sure he’s okay with it.”

Collective ownership also requires good communication. You’ll need a whole team and a team room, either physical or virtual, where people communicate fluidly. Use task planning and your task board to help people understand the work and stand-up meetings to coordinate it.

You need a way to ensure knowledge about changes spread throughout the team. Because anybody can make any change at any time, it’s easy to feel lost. Mob programming or pair programming are the easiest way to do this. If those aren’t an option, you’ll need to put extra effort into communicating about changes. Code reviews aren’t likely to be enough. Most people instinctively migrate to documentation as a solution, but it’s costly, as “Key Idea: Face-to-Face Conversation” on p.XX discusses. Try lighter-weight solutions first. One option is to hold a 30-minute “design recap” every day to discuss new ideas and recent changes.

Because collective code ownership increases the likelihood that people will touch the same code, you need to minimize the likelihood of painful merge conflicts. Continuous integration is the best option. For new codebases, merge conflicts are more likely because there’s so little code. Mob programming can be a good way to bootstrap the codebase even if it’s not something you plan to use long-term.

Although they’re not strictly necessary, simple design and test-driven development are a good idea for teams using collective code ownership. They make the code easier to understand and change.

Despite this long list of prerequisites, collective code ownership is easy to practice once the necessary conditions are in place. All you need is a shared team agreement that everyone can and should work in any part of the code, seeking out and providing assistance as needed. You don’t need everybody to know every part of the code; team members just need to be able to ask for help when working in an unfamiliar part of the code, and to be generous in providing help in return.

Indicators

When your team practices collective code ownership well:

  • Everyone on the team constantly makes minor improvements to all parts of the code.

  • Nobody complains about team members changing code without asking permission first.

  • When you come back to code you originally wrote, you find that it improved without your involvement.

  • When a team member leaves or takes a vacation, the rest of the team handles their work without interruption.

Alternatives and Experiments

The main alternatives to collective code ownership are weak code ownership and strong code ownership. In weak code ownership, people can change any part of the code, but particular developers are responsible for ensuring their quality, and it’s polite to coordinate changes with them. In strong code ownership, all changes must go through the owner.

Both of these approaches detract from Agile’s emphasis on teamwork, although weak code ownership isn’t as bad as strong code ownership. It can be useful for teams that don’t use pairing or mobbing, or who have trouble leaving code better than they found it.

But try to use collective code ownership, if you can. Collective ownership is one of those Agile ideas that’s often overlooked, but is actually essential. The great Agile teams have a feel to them. They’re intensely collaborative and prosocial. There’s a distinct lack of ego. Ideas and code are critiqued regularly, out of a genuine desire to improve, and nobody takes offense. Great Agile teams don’t engage in fingerpointing. They identify problems, they fix them, they move on.

Collective code ownership is the crucible from which great teams emerge.

Collective ownership doesn’t always mean collective code ownership, but I think it’s an important part of the equation. By sharing ownership of a codebase, developers learn to work together. They face problems and generate creative solutions. They gain a sense of shared responsibility, shared success, and joint pride of ownership. This is the crucible from which great teams emerge.

More concretely, collective code ownership enables the team to swarm their work—to work together on one story at a time, rather than spreading the team across multiple stories. This decreases cycle times, reduces waste, and tightens feedback loops, which improves agility.

Collective code ownership also improves code quality. People on the team engage deeply with all parts of the codebase, filing down individuals’ idiosyncrasies. Code reviews can accomplish the same result, although, in my experience, people don’t engage with code nearly as deeply in a code review as they do when using collective ownership.

Although it may be possible to have a fluent Delivering team without collective code ownership, I have yet to see it. Stick with this practice until you have a lot of experience as a fluent Delivering team. When you do experiment with alternatives, keep an eye on the above benefits and make sure you’re not losing them.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Part III: Delivering Reliably (Introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Delivering Reliably

It’s October again. (See the introduction to Part II.) Over the past year, your team has been working hard at developing their Delivering fluency, and now it’s a well-oiled machine. You’ve never enjoyed your work more: the little annoyances and friction you associate with professional software development—broken builds, frustrating bug hunts, painstaking change analysis—have all melted away. Now you can start a task and have it in production a few hours later.

Your only regret is that your team didn’t pursue Delivering fluency from the beginning. In retrospect, it would have been faster and easier, but people wanted to take it slow. Oh well. Now you know.

As you enter the team room, you see Valeri and Bo working together at a pairing station. They both like to come in early to beat rush-hour traffic. Valerie sees you putting away your backpack and catches your attention.

“Are you available to pair this morning?” she asks. She’s never been one for chit-chat. “Bo and I have been working on the real-time updates and he said you might have some ideas about how to test the networking code.”

“That’s right,” you nod. “Duncan and I spiked it yesterday and came up with something promising. Do you want to pair, or should the three of us mini-mob?”

“You can pair with Valeri,” Bo calls over, getting up and stretching. “I need a break from networking code.” He mock-shudders. “Even CSS is better than this.” Valeri rolls her eyes and shakes her head. “I’ll let you get settled in,” she says to you. “I need more coffee.”

Half an hour later, you and Valeri are making good progress on the real-time networking code. A steady series of soft chimes comes from the workstation. Every time you save your changes, a watch script runs your tests, then chimes a second later to indicate if the tests passed or failed.

You’ve gotten into a steady rhythm. At the moment, you’re driving and Valeri is navigating. “Okay, now let’s make sure it throws an error when the message is empty,” she says. You add a test. Ding-dong! The test fails. Without a pause, you switch to the production code, add an if statement, and save. Dong-ding! The test passes. “Now when the message is corrupted,” Valeri says. You add a line to the test. Ding-dong! Another if statement. Dong-ding! “Okay, I’ve got some notes about more edge cases,” Valeri says, “but I think we need to clean up these if statements first. If you factor out a validateMessage() method, that should help.” You nod, select the code, and hit the Extract Method keystroke. Dong-ding! No problems.

The sounds were the result of an experiment a few months ago. Despite the jokes about “Pavlov’s programmers,” they were a hit. Your team works in such small steps that, most of the time, the code does exactly what you expect it to. Your incremental test runs take less than a second, so the sound acts as instant feedback. You only need to look at the test runner when something goes wrong. The rest of the time, you stay in the zone, switching back and forth between tests, code, and refactoring, with the steady chimes assuring you that you’re on track and in control.

Another half hour later, the networking changes are done. You stretch as Valeri pulls the latest code from the integration branch and runs the full test suite. A minute later, it’s passed, and she runs the deployment script. “Done!” she says. “Time for more coffee. Keep an eye on the deploy for me?”

You settle back in your chair and watch the deployment script run through its paces. It pushes your changes to a private branch, then starts running the test suite on a VM. When the tests pass, the script merges your changes with the integration branch, then starts deploying the code to a canary production server. A few minutes later, the deploy is confirmed and the script tags your repository with the success.

You saunter back to the task board and mark the networking task green. “All done, Bo!” you call. “Ready for some CSS?”

Welcome to the Delivering Zone

The Delivering zone is for teams that want to deliver software reliably.

The Delivering fluency zone is for teams that want to deliver software reliably. They develop their technical skills so that their software is low maintenance, easy to improve and deploy, and has very few bugs. Specifically, teams that are fluent at Delivering:1

1These lists are derived from [Shore and Larsen 2018].

  • Release their latest work, at minimal risk and cost, whenever their business stakeholders desire.

  • Discover and fix flaws in the production lifecycle early, before they can do damage.

  • Are able to provide useful forecasts.

  • Have low defect rates, so they spend less time fixing bugs and more time building features.

  • Create software with good internal quality, which makes changes cheaper and faster.

  • Have high job satisfaction and morale, which improves retention and performance.

To achieve these benefits, teams need to develop the following skills. Doing so requires the investments described in chapter “Invest in Agility”.

The team responds to business needs:

  • The team’s code is production-grade and their latest work is deployed to a production-equivalent environment at least daily.

  • The team’s business representative may release the team’s latest work at will.

  • The team provides useful release forecasts to their business representative upon request.

  • The team coordinates with their business stakeholders to develop in a way that allows their software to be maintained, inexpensively, indefinitely.

The team works effectively as a team:

  • Developers consider code and similar artifacts to belong to the team, not individuals, and they share responsibility for changing and improving it.

  • All day-to-day skills needed to design, develop, test, deploy, monitor, maintain, etc., the team’s work are immediately accessible to the team.

The team pursues technical excellence:

  • When making changes, team members leave their software’s internal quality a little better than they found it.

  • Deploying and releasing is automated and takes no more than ten minutes of manual effort.

  • No manual testing is required prior to deployment.

  • Team members are aware of how their skills affect their ability to accomplish the team’s goals and improve internal quality, and they proactively seek to improve those skills.

Achieving Delivering Fluency

The practices in this part of the book will help your team achieve fluency in Delivering zone skills. For the most part, they center around simultaneous phases.

Most teams, even Agile teams, use a phase-based approach to development. They may work in iterations, but within each iteration, they follow a phase-based approach of requirements analysis, designing, coding, testing, and deploying, as shown in parts (a) and (b) of figure “Software Development Lifecycles”. Even teams using continuous flow tend to develop each story through a series of phases, using a swimlane visualization to track progress.

A figure in three parts. Part “a” is labelled “Waterfall,” and it shows development progressing through six phases: Plan, Analyze, Design, Code, Test, and Deploy. The whole cycle takes 3-24 months and only has the ability to release at the end. Part “b” is labelled “Phase-based Agile.” It shows the same waterfall cycle of part “a,” but now it’s compressed into multiple cycles lasting 1-4 weeks each, with the ability to release at the end of each cycle. Part “c” is labelled “XP-style Agile.” It also shows multiple cycles, but rather than showing the phases happening in order, the phases are stacked on top of each other and extend the entire length of the cycle. Each cycle lasts 1-2 weeks and has the ability to release at any point during the cycle.

Figure 1. Software development lifecycles

But Agile is inherently iterative and incremental. Each story is only a day or two of work. That’s not enough time for high-quality phases. In practice, design and testing get shortchanged. Code quality degrades over time, teams have trouble figuring out how to schedule necessary infrastructure and design work, and they run out of time for testing and bug fixing.

To prevent these problems, Extreme Programming (XP) introduced techniques to allow software development to be truly incremental. Rather than working in phases, XP teams work on all aspects of development incrementally and continuously, as shown in part (c) of figure “Software Development Lifecycles”.

Allies
Adaptive Planning
Incremental Requirements

Despite being created in the 1990’s, XP’s testing, coding, and design practices remain state-of-the-art. They yield the highest-quality, most productive code I’ve ever seen. They’ve since been extended by the DevOps movement to support modern cloud-based deployment. Together with incremental planning and requirements analysis, these techniques allow teams to deliver high-quality software regularly and reliably.

The practices in this part are based on XP. If you apply them thoughtfully and rigorously, you’ll achieve Delivering fluency. They’re grouped into five chapters:

  • Chapter “Collaboration” describes how to build software as a team.

  • Chapter “Development” describes how to incrementally build, test, and automate.

  • Chapter “Design” describes how to incrementally design code.

  • Chapter “Devops” describes how to deploy software reliably and at will.

  • Chapter “Quality” describes how to ensure software does what it's supposed to.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Retrospectives

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Retrospectives

Audience
Whole Team

We continually improve our work habits.

No process is perfect. As “Key Idea: Continuous Improvement” on p.XX describes, your team should constantly update and improve your development process. Retrospectives are a great tool for doing so.

Types of Retrospectives

The most common retrospective, the heartbeat retrospective, occurs on a regular cadence. (It’s also known as an iteration retrospective.) For teams using iterations, it occurs at the end of every iteration. For teams using continuous flow, it occurs at a preset time every week or two.

In addition to heartbeat retrospectives, you can also conduct longer, more intensive retrospectives at crucial milestones. These milestone retrospectives give you a chance to reflect more deeply on your experiences and condense key lessons to share with the rest of your organization.

Milestone retrospectives are out of the scope of this book. They work best when conducted by neutral third parties, so consider bringing in an experienced retrospective facilitator. Larger organizations may have such facilitators on staff (start by asking the HR department), or you can bring in an outside consultant. If you’d like to conduct them yourself, [Derby and Larsen 2006] and [Kerth 2001] are great resources.

How to Conduct a Heartbeat Retrospective

The whole team should participate in each retrospective. So people feel safe to speak their minds freely, people outside the team should not attend, with the possible exception of the facilitator.

Anybody on the team can facilitate. In fact, it’s best to switch facilitators frequently. That will help keep it interesting. Start with people who have facilitation experience. Once the retrospective is running smoothly, give the rest of the team a chance to facilitate.

The facilitator does not otherwise participate in the retrospective.

The facilitator does not otherwise participate in the retrospective. Their role is to keep the retrospective on track and to ensure that everyone’s voice is heard. If your team has trouble staying neutral, teams can trade facilitators, so that each team has a neutral, outside facilitator. Be sure that each facilitator agrees to keep everything they happens during the retrospective confidential.

I timebox my retrospectives to exactly one hour. Your first few retrospectives will probably run long. Give it an extra half-hour, but don’t be shy about politely wrapping up and moving to the next step. The whole team will get better with practice, and the next retrospective is only a week or two away.

As [Derby and Larsen 2006] describes, a retrospective consists of five parts: Set the Stage; Gather Data; Generate Insights; Decide What to Do; and Close the Retrospective. Below, I describe a simple, effective approach. Don’t try to match the timings exactly; let events follow their natural pace.

After you’ve acclimated to this format, change it up. The retrospective is a great venue for trying new ideas. See “Retrospectives Alternatives and Experiments” on p.XX for suggestions.

Allies
Safety
Team Dynamics

A word of caution before you begin: Retrospectives can be damaging when used to attack each other. If your team is having trouble treating each other with respect, focus on safety and team dynamics first.

Step 1: The Prime Directive (5 minutes)

In his essay, “The Effective Post-Fire Critique,” New York City Fire Department Chief Frank Montagna writes:

Firefighters, as all humans, make mistakes. When firefighters make a mistake on the job, however, it can be life-threatening to themselves, to their coworkers, and to the public they serve. Nonetheless, firefighters will continue to make mistakes and on occasion will repeat a mistake.

Never use a retrospective to place blame or attack individuals.

Everyone makes mistakes, even when lives are on the line. The retrospective is an opportunity to learn and improve, and everybody needs to be safe to share their experiences and opinions. The team should never use the retrospective to place blame or attack individuals.

As facilitator, it’s your job to nip destructive behavior in the bud. To help remind people of the need for psychological safety, I start each retrospective by repeating Norm Kerth’s Prime Directive:

Regardless of what we discover today, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.

If you’re in a physical team room, you can write the Prime Directive on a flip chart and bring it to each retrospective. If you have a virtual team room, you can make it part of a virtual whiteboard that’s dedicated to retrospectives.

Ask each attendee in turn if they agree to the Prime Directive and wait for a verbal “yes.” If they don’t agree, ask if they can set aside their skepticism just for this one meeting. If they still won’t agree, postpone the retrospective. There may be an interpersonal issue that needs to be addressed before people can speak with the openness and honesty the retrospective requires. If you’re not sure what the issue is, ask a mentor for help.

Asking for verbal agreement will feel awkward for some participants, but it serves an important purpose. First, if somebody truly objects, they’re more likely to say so if they have to agree out loud. Second, if somebody speaks up once during a retrospective, they’re more likley to speak again. Verbal agreement encourages participation.

Step 2: Brainstorming (20 minutes)

If everyone agrees to the Prime Directive, write the following categories on the whiteboard (or pre-prepare them):

  • Enjoyable

  • Frustrating

  • Puzzling

  • Keep

  • More

  • Less

Ask the team to use simultaneous brainstorming (see “Work Simultaneously” on p.XX) to think about what’s happened since the last retrospective and write down their reactions (the things they found enjoyable, frustrating, or puzzling) and preferences (the things they want to keep, do more of, or do less of). Write one per card and be sure to include the category on the card.

If people have trouble getting started, briefly recap what’s happened since the last retrospective. (”On Wednesday morning, we had our task planning session...”) Pause after each point to give people a chance to write down ideas. Other people can chime in with their recollections as well.

As the ideas wind down, check the time. If you have extra time remaining, let the silence stretch out. Someone will often say something that they have held back, and this may start a new round of ideas. If you’re running out of time, though, you can move on to the next step.

Step 3: Mute Mapping (15 minutes)

Next, use mute mapping to sort the cards into clusters. When they’re done, use dot voting to select the cluster that the team will focus on improving. After the voting ends, one cluster should be the clear winner. If not, don’t spend a lot of time choosing. Flip a coin or something.

Discard the cards from the other clusters. If someone wants to take a card to work on individually, that’s fine, but not necessary. Remember, you’ll do another retrospective in a week or two. Important issues will recur.

Frustrated that your favorite category lost? Wait a few months. If it’s important, it will win eventually.

Step 4: Retrospective Objective (20 minutes)

Come up with experiments that might make things better.

Now it’s time to come up with options for improvement. Ask the team to brainstorm ideas for improving the selected category. This can involve any idea you can think of: performing some action, changing your process, changing a behavior, or something else entirely. Don‘t try to come up with perfect or complete solutions; just come up with experiments that might make things better.

Ally
Circles and Soup

Allow this conversation to go on for several minutes. If people are having trouble thinking of ideas, try asking them “why” questions: Why does the current approach need improvement? Why doesn’t it work as is? Why is it done this way? It can also help to think in terms of circles and soup: what the team controls and what they influence.1

1XXX Revise when final name of “circles and soup” practice is decided.

Don’t go into a lot of detail. A general direction is good enough. For example, if “pairing” was the category selected, then “switch pairs more often,” “ping-pong pairing,” and “switch pairs on a schedule” are all valid ideas.

The group may coalesce around a single good idea. Other times, there might be several competing proposals. If there are, conduct another dot vote to choose one. This is your retrospective objective: the improvement that the whole team will work toward until the next retrospective. Limit yourself to just one, so the team can focus.

Once you have a retrospective objective, ask somebody to volunteer to work out the details and follow through. It won’t be that person’s job to push or own the objective—that’s for the whole team—but they’ll help people remember the objective when needed. Other team members can volunteer to help if they want.

Wrap up the meeting with a consent vote. (See “Seek Consent” on p.XX.) When everybody consents, the retrospective is over. If you can’t reach consent for some reason, choose another idea or start over at the next retrospective.

Follow Through

If nothing changes, the retrospective didn’t work.

It’s all too easy to leave the retrospective and think, ”Well, that’s done until next week.” Make sure you actually follow through on the retrospective objective. If nothing changes, the retrospective didn’t work.

Ally
Informative Workspace
Stand-Up Meetings

To help the team follow through, make the retrospective objective visible. If you decided to do something, add those tasks to your plan. If you’re changing your process, update your planning boards to visualize the change. If you want people to change their behavior, track it with a big visible chart. If you’re changing a working agreement, update your working agreements poster.

Check in on the retrospective objective every day. The stand-up meeting can be a good place to check in and remind team members to follow through.

Questions

Despite my best efforts as facilitator, our retrospectives always degenerate into blaming and arguing. What can I do?

Ally
Team Dynamics
Safety

Hold off on retrospectives, for now, and focus on team dynamics and establishing psychological safety instead.2 If that doesn’t help, you may need outside help. Consider bringing in an organizational development (OD) specialist to help. Your HR department may have someone on staff.

2XXX revist after Team Dynamics and Safety practices are complete.

We come up with good retrospective objectives, but then nothing happens. What are we doing wrong?

Your ideas may be too big. Remember, you only have one week, maybe two, and you have other work to do, too. Try making plans that are smaller scale—perhaps a few hours of work—and follow up every day.

Ally
Slack

Another possibility is that you don’t have enough slack in your schedule. When you have a completely full workload, nonessential tasks such as improving your work habits go undone. (The sad irony is that improving your work habits will give you more time.)

Finally, it’s possible that the team doesn’t feel like they truly have a voice in the retrospective. Take an honest look at the way you conduct the retrospective. Are you leading the team by the nose rather than facilitating? Consider having someone else facilitate the next one.

Some people won’t speak up in the retrospective. How can I encourage them to participate?

It’s possible they’re just shy. It’s not necessary for everyone to participate all the time. Try starting your next retrospective with an icebreaker activity and see if that helps.

On the other hand, they may have something they want to say, but don’t feel safe doing it. In that case, focus on developing psychological safety in the team.

One group of people (such as testers) always gets outvoted in the retrospective. How can we meet their needs, too?

Over time, it’s likely that every major issue will get its fair of attention. Give the retrospective a few months before deciding that a particular group is disenfranchised. One time I worked with had a few testers that felt their priority was being ignored. A month later, after the team had addressed another issue, the testers’ concern was on the top of everyone’s list.

If time doesn’t help, you can use weighted dot voting. Give people with under-represented skills more votes.

Our retrospective always takes too long. How can we go faster?

As a facilitator, it’s okay to be decisive about wrapping things up and moving on. There’s always next time. If the group is taking a long time brainstorming ideas or mute mapping, you might say something like, “Okay, we’re running out of time. Take two minutes to write down your final thoughts (or make final changes) and then we’ll move on.”

That said, I prefer to let a retrospective take its natural course during the first month or so, even if that means running long. This allows people to get used to the flow of the retrospective without stressing too much about timelines.

The retrospective isn’t accomplishing much. Can we do it less often?

If your team is fluent in your chosen fluency zones and everything’s running smoothly, it’s possible that there’s not much left to improve. In that case, you could try conducting retrospectives less frequently, although you should continue to have one at least every month.

That’s usually not the case, though. It’s more likely that the retrospective has just gotten stale. Try changing it up. Switch facilitators and try new activities or focuses.

Prerequisites

Ally
Safety

The biggest danger in a retrospective is that it will become a venue for acrimony rather than for constructive problem solving. Make sure you’ve created an environment where people are able to share their true opinions. Don’t conduct retrospectives if you have team members who tend to lash out, attack, or blame others.

Indicators

When your team conducts retrospectives well:

  • Your ability to develop and deliver software steadily improves.

  • The whole team grows closer and more cohesive.

  • Each specialty within the team gains respect for the issues other specialties face.

  • Team members are honest and open about successes and failures.

  • The team is comfortable with change.

Alternatives and Experiments

Every retrospective format gets stale over time. Change it up! The format in this book is an easy starting point, but as soon as you have it running smoothly, experiment with other ideas. [Derby and Larsen 2006] is a good resource for learning about how retrospectives are constructed and it also has a variety of activities you can try. Once you’ve absorbed its ideas, see https://www.tastycupcakes.org for more ideas.

Some people find that an hour is too constraining to conduct a satisfying retrospective, and prefer 90 minutes, or even two hours. Feel free to experiment with longer and shorter lengths. Some activities, in particular, will need more time. As you experiment, conduct a brief “retrospective on the retrospective” to evaluate which retrospective experiments should be repeated and which shouldn’t. Chapter 8 of [Derby and Larsen 2006], “Activities to Close the Retrospective,” has ideas.

In addition to trying new activities, you can also experiment with completely different approaches to process improvement. Arlo Belshee, for example, tried a continuous approach, where people put observations in a jar throughout the week, then reviewed those observations at the end of the week. Woody Zuill has an exercise he calls “turn up the good:” at the end of every day, conduct a five-minute retrospective to choose something that went well and decide how to do it even more.

Get familiar with normal heartbeat retrospectives first, though, so you can tell if your experiments are an improvement or not.

Further Reading

Project Retrospectives [Kerth 2001] is the definitive resource for milestone retrospectives.

Agile Retrospectives: Making Good Teams Great [Derby and Larsen 2006] picks up where Kerth leaves off, discussing techniques for conducting all sorts of Agile retrospectives.

“The Effective Post-Fire Critique” [Montagna 1996] is a fascinating look at how a life-and-death profession approaches retrospectives.

XXX Retrospectives Antipatterns? (Aino Corry)

XXX Thomas Owens recommends:

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Chapter: Accountability (introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Accountability

If Agile teams own their work and their plans, how do their organizations know they’re doing the right thing? How do they know that the team is doing their best possible work, given the resources, information, and people they have?

Organizations may be willing, even eager, for teams to follow an Agile approach, but this doesn’t mean Agile teams have carte blanche authority to do whatever they want. They’re still accountable to the organization. They need to demonstrate that they’re spending the organization’s time and money appropriately.

This chapter has the practices you need to be accountable to your organization:

  • “Trust” on p.XX: Work in a way that gives stakeholders confidence.

  • “Stakeholder Demos” on p.XX: Get feedback about your progress.

  • “Forecasting” on p.XX: Predict when software will be released.

  • “Roadmaps” on p.XX: Share your progress and plans.

  • “Management” on p.XX: Help teams excel.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Management

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Management

Audience
Managers

We help our teams excel.

Stakeholder demos and roadmaps allow managers to see what their teams are producing. But managers need more. They need to know whether their teams are working effectively and how they can help them succeed.

Unlike the other practices in this book, which are aimed at team members, this practice is for managers. It’s primarily for team-level managers, but the ideas can be applied by middle and senior managers as well. In an environment where teams decide for themselves how work will be done (see “Key Idea: Self-Organizing Teams” on p.XX), what do managers do, and how do they help their teams excel?

Most organizations use measurement-based management: gathering metrics, asking for reports, and designing rewards to incentivize the right behavior. It’s a time-honored approach to management that stretches back to the invention of the assembly line.

Measurement-based management doesn’t work.

There’s just one problem. It doesn’t work.

Theory X and Theory Y

In the 1950’s, Douglas McGregor identified two opposing styles of management: Theory X and Theory Y. The two styles are each based on an underlying theory of worker motivation.

Theory X managers believe that workers dislike work and try to avoid it. As a result, workers have to be coerced and controlled. Extrinsic motivators such as pay, benefits, and other rewards are the primary mechanism for forcing employees to do what is needed. Furthermore, Theory X managers believe, workers want to be treated this way, because they’re inherently unambitious and avoid responsibility. Under Theory X management, the design and implementation of extrinsic motivation schemes, using tools such as measurement and rewards, is central to good management.

Theory Y managers believe that workers enjoy work and are capable of self-direction. They seek responsibility and enjoy problem-solving. Intrinsic motivators such as the satisfaction of doing a good job, contributing to a group effort, and solving hard problems are the primary drivers of employee behavior. Under Theory Y management, providing context and inspiration, so workers can work without close supervision, is central to good management.

Measurement-based management is a Theory X approach. It’s based on using extrinsic motivators to incentivize correct behavior. Agile, in contrast, is a Theory Y approach. Agile team members are expected to be intrinsically motivated to solve problems and achieve organizational goals. They need to be able to decide for themselves what to work on, who will do it, and how the work will be done.

Agile requires Theory Y management.

These assumptions are built into the foundations of Agile. Theory Y management is expected and required for Agile to succeed. Theory X management won’t work. Even if you strip out the disrespect for workers, Theory X’s underlying reliance on measurements and rewards distorts behavior and creates dysfunction. I’ll explain in a moment.

The Role of Agile Management

Some managers worry that there’s no place for them in an Agile environment. Nothing could be further from the truth. Managers’ role changes, but it isn’t diminished. In fact, by delegating details to their teams, managers are freed up to focus on activities that have more impact.

Agile managers manage the work system rather than individual work. They set their teams up for success. Their job is to guide their teams’ context so that each team makes correct choices without explicit management involvement. In practice, this means team managers:2

2Thanks to Diana Larsen for her contributions to this list.

  • Make sure the right people are on the team, so that the team has all the skills needed for its work. This includes coordinating hiring and promotions.

  • Make sure the team includes the coaches it needs.

  • Mediate interpersonal conflicts, help team members navigate the chaos of change, and help team members jell as a team.

  • Help individual team members develop their careers. Mentor individuals to become future leaders and encourage team members to cross-train so that the team is resilient to the loss of any one person.

  • Monitor the team’s progress towards fluency (see the skill checklists in the introductions to Parts II-IV) and coordinate with the team’s coaches to procure training and other resources the team needs to reach fluency.

  • Procure the tools, equipment, and other resources the team needs to be productive.

  • Ensure that the team understands how their work fits into the big picture of the organization, that they have a charter (see “Planning Your Chartering Session” on p.XX), and that the charter is updated regularly.

  • Provide insights about how well the team is fulfilling their charter and how their work is perceived by stakeholders, particularly management and business stakeholders.

  • Maintain awareness of the relationships between the team and its stakeholders, and help the team understand when and why those relationships aren’t working well.

  • Advocate for the team within the rest of the organization, and coordinate with peer managers to advocate for each others’ teams. Help the team navigate organizational bureaucracy and remove impediments to their success.

  • Ensure organizational expectations around topics such as budgeting, governance, and reporting are fulfilled. Judiciously push for relaxing those requirements when it would help the team.

Measurement Dysfunction

Measurement-based management distorts behavior and causes dysfunction.

One thing you won’t see on that list: reporting and metrics. That’s because measurement-based management distorts behavior and causes dysfunction. Some examples:

Stories and story points

A team’s manager wanted to know if the team was productive, so they tracked the number of stories their team finished every iteration. The team cut back on testing, refactoring, and design so they could get more stories done. The result was reduced internal quality, more defects, and lower productivity. (Tracking capacity yields the same results. See “Capacity Is Not Productivity” on p.XX for more about this common mistake.)

Code coverage

An executive mandated that all new code be tested. Eighty-five percent code coverage was the goal. “All new code needs tests,” he said.

Good tests are small, fast, and targeted, which takes care and thought. This executive’s teams worked on meeting the metric in the quickest and easiest way instead. They wrote tests that covered a lot of code, but they were slow and brittle, failed randomly, and often didn’t check anything important. Their code quality continued to degrade, their productivity declined, and their maintenance costs went up.

Lines of code

In an effort to encourage productivity, a company rewarded people for number of lines added, changed, or deleted per day. (Number of commits per day is a similar metric.) Team members spent less time thinking about design and more time cutting and pasting code. Their code quality declined, maintenance costs increased, and they struggled with “mushroom” defects that kept popping back up after people thought they had been fixed.

Say/do ratio

Although meeting commitments is important for building trust, it isn’t a good metric. Nevertheless, one company made meeting commitments a key value. “Accountability is very important here,” they said. “If you say you’re going to do something by a certain date, you have to do it. No excuses.”

Their teams became very conservative in their commitments. Their work expanded to fill the time available, reducing throughput. Managers started pushing back on excessively-long deadlines. Now the teams had to rush their work and take shortcuts, resulting in reduced internal quality, more defects, higher maintenance costs, and customer dissatisfaction.

Defect counts

Which is easier: reducing the number of defects a team creates, or changing the definition of “defect?” An organization that tracked defect counts wasted time on contentious arguments about what counted as a defect. When the definition was too strict, the team spent time fixing defects that didn’t matter. When it was too loose, they shipped bugs to customers, hurting customer satisfaction.

Why Measurement Dysfunction is Inevitable

Rather than doing work that achieves the best result, people do work that achieves the best score.

When people believe that their performance will be judged based on a measurement, they change their behavior to get a better score on that measurement. But people’s time is limited. By doing more for the measurement, they must do less for something else. Rather than doing work that achieves the best result, they do work that achieves the best score.

Everybody knows that metrics can cause problems. But that’s just because managers chose bad metrics, isn’t it? A savvy manager can prevent problems by carefully balancing their metrics... right?

Unfortunately, no. Robert Austin’s seminal book, Measuring and Managing Performance in Organizations [Austin 1996], explains:

The fundamental message of this book is that organizational measurement is hard. The organizational landscape is littered with the twisted wrecks of measurement systems designed by people who thought measurement was simple. If you catch yourself thinking things like, “Establishing a successful measurement program is easy if you just choose your measures carefully,” watch out! History has shown otherwise. (pp. 180-181)

The situation would be different if you could measure everything that mattered in software development. But you can’t. There are too many things that are important that—although they can be measured in some ways—can’t be measured well. Internal quality. Maintenance costs. Development productivity. Customer satisfaction. Word-of-mouth. Here’s Robert Austin again:

As a professional activity that has much mental content and is not very rotable, software development seems particularly poorly suited to measurement-based management... There is evidence that software development is plagued by measurement dysfunction. (pp. 111-112)

In practice, measurements will not be comprehensive, and inhabitants of the black box will gain control of the measurement instrument to make it report what will make them look good. (p. 131)

People—particularly in software development—hate this message. We love the fantasy of a perfectly rational and measurable world. Surely it’s just a matter of selecting the right measurements!

There is no way to measure everything that matters in software development.

It’s a pretty story, but it’s a trap. There is no way to measure everything that matters in software development. The result is an endless cycle of metrics programs, leading to dysfunctions, leading to new metrics, leading to new dysfunctions.

A [manager] who commits dysfunctional acts mistakenly believes she is in a fully [measurable] situation when she is, in fact, in a partially [measurable] situation... In real settings, managers are charged with controlling activity in their areas of organizational responsibility. Unfortunately, the need for control is often interpreted narrowly as a need for measurement-based control. The [manager’s] job is then usually perceived to be the redesign of [worker] tasks to make them more measureable. (pp. 126-127)

Even when dysfunction is discovered and it is revealed that full [measurement] has not been achieved, a [manager] may still resist the conclusion that full [measurement] cannot be achieved. She may conclude instead that she simply got it wrong when she attempted the last job redesign. An unending succession of attempts at job redesign may follow, as the [manager] tries earnestly to get it right... The result is that designers of software production systems are forever redesigning, replacing old modes of control, and substituting new but structurally similar modes, with predictable lack of success. (pp. 132-133)

Delegatory Management

Even if an effective measurement system was possible, measurements are missing the point. Agile requires Theory Y management, not Theory X management, and Theory Y management is based on intrinsic motivators, not measurements and reward systems.

Rather than thinking about measurements and rewards, focus on what intrinisically motivates your team members. What do they love about their work? Is it creating something “insanely great” that customers love? Is it pushing the bounds of technical achievement? Is it being part of a high-functioning, jelled team? Or getting lost in the flow of productive work?

Whatever the motivation, inspire your teams by showing how their work will fulfill their needs. Provide them with the resources and information they need. And step back so they can take ownership and excel.

In contrast [to measurement-based management], delegation cannot produce distortion. If the customer’s value function changes, the change is immediately reflected in the effort allocation of the [worker], as long as he is aware of the change... Under delegation, workers are likely to take more initiative; they act in accordance with their own expectations instead of reacting to whatever carrot hangs before them. (p. 109)

Make measurements inconsequential

It’s not that measurements and data aren’t useful. They are! The problems arise when people think the measurements will be used to assess performance. Unfortunately, people—especially software developers—tend to be cynical about these things. It isn’t what managers say that matters; it’s what people think that causes dysfunction.

To avoid dysfunction, you have to make it structurally impossible to misuse the data.

The easiest way to do so is to keep information private to the team. The team collects the data, the team analyzes the data, and the team discards the data. They report their conclusions and decisions, not the underlying data. If nobody else sees it, there’s no risk of distortion.

If that’s not possible, aggregate the data so that it can’t be attributed to any one person. Instead of using data to evaluate subordinates, use data to evaluate yourself. This can apply to all levels of the organization. Team managers see team measures, not individual measures. Directors see departmental measures, not team measures. And so forth.

Go to gemba

If managers don’t get data about their subordinates, how do they know how people are performing? They go to gemba.

The phrase “Go to Gemba” comes from Lean Manufacturing. It means “go see for yourself.”3 The idea is that managers learn more about what’s needed by seeing the actual work than by looking at numbers.

3“Gemba” is a Japanese word meaning “the actual place [where something happened],” so “go to gemba” literally means “go to the actual place.”

To learn about your teams, go see for yourself.

Managers, to learn about your teams, go see for yourself. Look at the code. Review the UI mockups. Sit in on stakeholder interviews. Attend a planning meeting.

Then think about how you want your team improve. Ask yourself, “Why aren’t they already doing that themselves?” Assume positive intent: In most cases, it’s not a motivational issue; it’s a question of ability, organizational roadblocks, or—and don’t discount this one—the idea was already considered and set aside for good reasons that you’re not aware of. Crucial Accountability: Tools for Resolving Violated Expectations, Broken Commitments, and Bad Behavior [Patterson et al. 2013] is an excellent resource that discusses what to do next.

Ask the team

Fluent Agile teams have more information about the day-to-day details of their work than anybody else. Rather than asking for measurements, managers can ask their teams a simple question: “What can I do to help your team be more effective?” Listen. Then act.

Define goals and guardrails

Although the team owns their work, the goals of that work are defined by management. It’s okay to put requirements and boundaries in place. For example, one director needed to know that his teams were processing a firehose of incoming data effectively. He gathered together his team of managers, told them his need, and asked them to create a measurement that teams could track themselves, without fear of being judged. The director didn’t need to see the measurement; he needed to know that his teams were able to stay on top of it, and if not, what they needed to do so.

When Metrics Are Required

All too often, managers’ hands are tied by a larger organizational system. To return to Robert Austin:

The key fact to realize is that in a hierarchical organization every manager is [also measured]. Manager performance is very difficult to measure because of the intangible nature of managerial duties... her own performance is judged mostly by how well her organization—that is, her [workers]—does according to the very measurement system the [manager] installs. The [manager] has an interest, then, in installing easily exploitable measurement systems. The [manager] and [worker] quietly collude to their mutual benefit. (pp. 137-138)

Report narratives and qualitative information rather than quantitative data.

If you must report something, provide narratives and qualitative information, not quantitative measurements that can be abused. Tell stories about what your teams have done and what they’ve learned.

That may not be enough. You might be required to report hard numbers. Push back on this, if you can, but all too often, it will be out of your control.

If you have control over the measurements used, measure as close to real-world outcomes as possible. One such possibility is value velocity.

Value velocity is an actual measurement of productivity. It measures the output of the team over time. To calculate it, measure two numbers for each valuable increment the team releases: the impact, such as revenue; and the lead time, which is the number of weeks (or days) between when development started and when the increment was released. Then divide: impact ÷ time = value velocity.

In many cases, the impact isn’t easily measurable. In that case, you can estimate the impact of each increment instead. This should be done by the sponsor or key stakeholders outside the team. Make sure that all estimates are done by the same person or tight-knit team, so they’re consistent with each other.

Remember, though, that value velocity distorts behavior just like any other metric. Whichever metrics you collect, do everything you can to shield your team from dysfunction. Most metrics harm internal quality, maintenance costs, productivity, customer satisfaction, and long-term value, because these are hard to measure and tempting to shortchange. Emphasize the importance of these attributes to your teams, and—if you can do so honestly—promise them that you won’t use metrics in your performance evaluations.

Questions

What about “if you can’t measure it, you can’t manage it?”

“If you can’t measure it, you can’t manage it” is often attributed to W. Edwards Deming, a statistician, engineer, and management consultant whose work influenced Lean Manufacturing, Lean Software Development, and Agile.

Deming was massively influential, so it’s no wonder his quote is so well known. There’s just one problem: He didn’t say it. He said the opposite.

It is wrong to suppose that if you can’t measure it, you can’t manage it—a costly myth.4

4This quote is explained and put into context at The W. Edwards Demings Institute: https://deming.org/myth-if-you-cant-measure-it-you-cant-manage-it/

W. Edwards Deming

Prerequisites

Delegatory management requires an organizational culture that understands measurement dysfunction. Despite being decades old—Deming articulated the need to remove measurement-based management in at least 19825—it’s still not widely understood and accepted.

5Point 12 of Deming’s 14 Points for Management: a) Remove barriers that rob the hourly worker to pride of workmanship. The responsibility of supervisors must be changed from sheer numbers to quality. b) Remove barriers that rob people in management and engineering of their right to pride of workmanship. This means, inter alia, abolishment of the annual or merit rating and of management by objective.

Agile can still work in a measurement-based environment, but the purpose of this book isn’t to tell you what merely works; it’s to tell you what excels. Delegatory management excels, if you’re able to use it.

Indicators

When you use delegatory management well:

  • Teams feel they’ve been set up for success.

  • Teams own their work and make good decisions without management’s active participation.

  • Team members feel confident to do what leads to the best outcomes, not the best scores.

  • Team members and managers aren’t tempted to deflect blame and engage in finger-pointing.

  • Managers have a sophisticated, nuanced understanding of what their teams are doing and how they can help.

Alternatives and Experiments

The message in this practice—that measurement-based management leads to dysfunction—is a hard pill for a lot of organizations to swallow. You may be tempted by alternatives that promise to solve measurement dysfunction through elaborate balancing schemes.

Before you do that, remember that Agile is a Theory Y approach to development. The correct way to manage an Agile team is through delegatory management, not measurement-based management.

If you do look at alternative metrics ideas, be careful. Measurement dysfunction isn’t immediately obvious. It can take a few years to become apparent, so an idea can sound great on paper and even appear to work at first. You won’t discover the rot until later, and even then, it’s all too easy to blame the problem on something else.

In other words, be skeptical of any approach to metrics that isn’t at least as rigorous as [Austin 1996]. It’s based on Austin’s award-winning economics Ph.D. thesis.

That said, there are also good, thoughtful takes on Agile management. As you look for opportunities to experiment, look for opportunities that emphasize a collaborative and delegatory Theory Y approach. The resources in the Further Reading section are a good starting point.

Further Reading

Measuring and Managing Performance in Organizations [Austin 1996] was the inspiration for this practice. It presents a rigorous economic model while remaining engaging and approachable.

Turn the Ship Around! A True Story of Turning Followers into Leaders [Marquet 2013] is a gripping read, and an excellent way to learn more about delegatory management. The author describes how he, as captain of a U.S. nuclear submarine, learned to apply delegatory management with his crew.

Crucial Accountability: Tools for Resolving Violated Expectations, Broken Commitments, and Bad Behavior [Patterson et al. 2013] is an good resource for managers who need to intervene to help their employees.

Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A’s, Praise, and Other Bribes [Kohn 1999] is a thorough exploration of the differences between intrinsic and extrinsic motivation.

XXX Johanna Rothman, Pollyanna Pixton

XXX The Tyranny of Metrics (Jerry Z. Muller)

XXX Deming, Out of the Crisis

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Roadmaps

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Roadmaps

Audience
Product Managers

Our stakeholders know what to expect from us.

Ultimately, accountability is about providing good value for your organization’s investment. In a perfect world, your business stakeholders will trust your team to do so without close supervision. This is achievable, but it usually takes a year or two of delivering reliably first.

In the meantime, your organization is going to want to oversee your team’s work. Stakeholder demos help, but managers often want to know more about what you’re doing and what to expect. You’ll share this information in your roadmap.

Agile roadmaps don’t have to look like traditional software roadmaps. I’m using the term fairly loosely, to encompass a variety of ways that teams share information about their progress and plans. Some roadmaps are detailed and to the point, for sharing with managers; others are high-level and glossy, for sharing with customers.

Agile Governance

The type of roadmap you provide dependence on your organization’s approach to governance. How does your organization ensure teams are working effectively and moving in the right direction?

The classic approach is project-based governance. It involves creating a plan, an estimate of costs, and an estimate of value. The project is funded if the total value sufficiently exceeds the total costs. Once funded, the project is carefully tracked to ensure that it proceeds according to plan.

This is a predictive approach to governance, not an Agile one. It assumes that plans should be defined in advance. Change is carefully controlled and success is defined as meeting the plan. Management needs roadmaps that include detailed plans, cost estimates, and completion progress.

The Agile approach is product-based governance.

The Agile approach is product-based governance. It involves allocating an ongoing “business as usual” budget and estimating the value the team will produce over time. The product is funded if the ongoing value sufficiently exceeds the ongoing costs. Once funded, the product’s value and costs are carefully monitored to ensure that it’s achieving the desired return on investment. When the value is different than estimated, costs and plans are adjusted accordingly.

This is an adaptive approach to governance. It assumes that the team will seek out information and new opportunities, then change their plans to take advantage of what they learned. Success is defined in terms of business results, such as return on investment. Management needs roadmaps that include spending, value metrics such as revenue, and a business model.

Although Agile is adaptive, not predictive, many Agile teams are subject to project-based governance. Your roadmaps need to accommodate this reality. I’ve provided four options, from maximally adaptive to maximally predictive. Choose the lowest numbered option you can get away with. In some cases, you’ll have multiple roadmaps, such as one for management oversight and one for sales and marketing.

You can present your team’s roadmap in whatever format you like, and to any level of detail. For internal roadmaps, a small slide deck, an email, or a wiki page are all common choices. For externally-shared roadmaps, a glossy, less-detailed web page or marketing video are common.

Option 1: Just the Facts

A “just the facts” roadmap isn’t a roadmap at all, in the traditional sense of the word. Instead, it’s a description of what your team has actually done, with no speculation about the future.

From an accountability and commitment perspective, this is the safest type of roadmap, because you only share things that have happened. It’s also the easiest to adapt, because you don’t make any promises about future plans. It includes:

Ally
Purpose
  • Your team’s purpose.

  • What’s complete and ready for your next release.

  • Your next release date, if you’re using pre-defined release dates. (See “Predefined Release Dates” on p.XX.)

Additionally, for management roadmaps, Optimizing teams will include:

  • Current business value metrics (revenue, customer satisfaction, etc.)

  • Current costs

  • Business model

Even if management needs a more predictive roadmap, a “just the facts” roadmap can work well for sales and marketing. The advantage of the “just the facts” approach is that no one is ever upset when your plans change, because they don’t know your plans have changed. Combined with a release train (see “Release Early Release Often” on p.XX), this can lead to a regular announcements of exciting new features that people can have right now.

One well-known example of this approach is Apple, which tends to announce new products only when they’re ready to buy. It’s also common in video games, which use regular updates accompanied by “what’s new” marketing videos to re-energize interest and engagement.

Option 2: General Direction

Stakeholders often want more than just the facts. They want to know what’s coming, too. A “general direction” roadmap strikes a good balance. Speculation is kept to a minimum, so your team can still adapt its plans, but stakeholders aren’t kept entirely in the dark about the future.

The roadmap includes everything in the “just the facts” roadmap, plus:

  • The valuable increment the team is currently working on, and why it’s the top priority.

  • The valuable increment (or increments) most likely to be worked on next.

The increments are presented without dates.

Optimizing teams might also include hypotheses about the business results of upcoming releases.

Option 3: Time and Scope

Ally
Forecasting

A “time and scope” roadmap adds forecasted release dates to the “general direction” roadmap. This reduces agility and increases risk, because people tend to take these sorts of roadmaps as commitments, no matter how many caveats you provide.

That leaves teams with an uncomfortable tradeoff: either you use a conservative forecast, such as one with a 90% probability of success, and provide a pessimistic release date; or you use a more optimistic forecast, such as one with a 50% probability of success, and risk missing the date. Furthermore, work tends to increase to fill the time available, so more conservative forecasts are likely to result in less work getting done.

However, because the roadmap doesn’t include the details of each increment, the team can still steer its plans as described in “How to Steer Your Plans” on p.XX. Rather than forecasting when every story will be done, make a conservative forecast for the “must have” stories in your plan. This will give you a forecast you can meet that still isn’t too far in the future. Then, if you end up with extra time—and, if the forecast was truly conservative, you usually will—you can use that time to add polish and other “nice to have” stories.

Optimizing teams usually don’t use this sort of roadmap. The business cost isn’t worth the benefit. However, it can be useful when they need to coordinate with third parties, such as for a trade show or other marketing event.

Option 4: Detailed Plans and Predictions

This option is the least agile and has the greatest risk. It’s a “time and scope” roadmap that also includes every story in the team’s plan. As a result, the team can’t steer its plans without having to justify their changes. This results in more conservative forecasts—meaning more potential for wasted time—and less willingness to change.

Although this is the riskiest type of roadmap, organizations tend to prefer it. It feels safer, even though it’s actually the least safe approach. Uncertainty makes people uncomfortable, and this roadmap allows them to speak with certainty.

Artificial certainty just makes adapting to changing circumstances more difficult.

That certainty is an illusion, though. Software development is inherently uncertain. Artificial certainty just makes adadpting to changing circumstances more difficult.

Sometimes you have to provide this sort of roadmap anyway. To do so, make forecasts that include every story, not just the “must-have” stories. As before, you’ll need to decide between conservative forecasts, which are reliable but potentially wasteful, and more optimistic forecasts, which you could fail to meet.

Teams without Delivering fluency typically have a lot of uncertainty in their forecasts, which means that a properly conservative forecast will show a release date that’s too far in the future for stakeholders to accept. You’ll typically have to use a less conservative forecast, even though the date is more likely to be missed. One way to work around this is to only forecast near-term releases, if you can. “Improving Forecast Ranges” on p.XX has more details.

Optimizing teams don’t use this roadmap.

Corporate Tracking Tools

Tracking teams with planning tools is a mistake.

Companies will often mandate that their teams use a so-called Agile lifecycle management tool, or other planning tool, so they can track teams’ work and create reports automatically. This is a mistake. Not only does it hurt the team—which needs free-form visualizations that they can easily change and iterate—it reinforces a distinctly non-Agile approach to management.

Ally
Management
Purpose
Stakeholder Demos

Agile management is about creating a system where teams make effective decisions on their own. Managers’ job is to ensure teams have the information, context, and support they need. “Agile” planning tools are anything but Agile: they’re built for tracking and controlling teams, not enabling them. They’re an expensive distraction at best. Don’t use them. They will hurt your agility.

That doesn’t mean teams have no guidance. Management still needs to keep its hands on the wheel. But this is done by iterating each team’s purpose, providing oversight and feedback during stakeholder demos, and using the most adaptive roadmaps possible, in addition to effective and engaged team-level management.

If your team is required to use a corporate tracking tool, only enter the information required by your roadmap. Use the other planning practices described in this book for your day-to-day work, copying information into the tool when needed. If your roadmap only includes valuable increments, not stories, this won’t be too much of a burden.

Ally
Visual Planning

If you have to include stories in your roadmap—which I don’t recommend—see if there’s a lightweight way you can do so. Perhaps you can take a picture of your visual plan rather than transcribing the cards into a tool. Perhaps managers should be more involved in planning sessions, or perhaps they’re asking for something they don’t actually need.

If they insist, though, you can transcribe stories into a corporate tracking tool. Do it once per week—or daily, if you have no other choice—and remember that each story should only be a short phrase, not a miniature requirements document.

If managers need you to maintain more detail in the tool, or insist on tracking individual tasks, something is wrong. Management may be having trouble letting go, or your organization may not be a good fit for Agile. Ask a mentor for help.

When Your Roadmap Isn’t Good Enough

Eventually, somebody is going to ask you for a time and scope roadmap, or a detailed plans and predictions roadmap, then tell you that you need to deliver sooner.

Cutting scope is the only sure way to deliver sooner.

There is only one sure way to deliver sooner: cut scope. You have to take stories out of your plan. Everything else is wishful thinking.

You can try improving your capacity (see “How to Improve Capacity” on p.XX) or further developing fluency, but start by cutting scope. If your other efforts pay off, you can put the cut stories back in.

Sometimes, you won’t be allowed to cut scope. In this case, you have a tough choice to make. Reality won’t bend, so you’re stuck with political options. You can either stand your ground, refuse to change your forecast, and risk getting fired; or you can use a less conservative forecast, provide a nicer-looking date, and risk releasing late.

Before making that decision, look around at the other teams in your company. What happens when they miss their dates? In many companies, release dates are used as a bludgeon—a way of pressuring people to work harder—but have no real consequences. In others, release dates are sacred commitments.

If you’re trapped in a situation where your roadmap isn’t good enough and you don’t have the ability to cut scope, ask for help. Rely on team members who understand the politics of your organization, discuss your options with a trusted manager, or ask a mentor for advice.

Remember, whenever possible, the best approach to forecasting is to choose a predefined release date and steer your plans to meet that date exactly.

Questions

How often should we update our roadmap?

Ally
Stakeholder Demos

Update it whenever there’s substantive new information. The stakeholder demo is a good venue for sharing roadmap changes.

What should we tell our stakeholders about forecast probabilities?

In my experience, forecast probabilities are hard for stakeholders to understand. Providing a range of dates can work, but the probabilities behind the range are hard to explain succinctly.

If teams don’t report their detailed plans, how do team-level managers understand what their teams are doing?

Team-level managers can look at their team’s planning boards directly. See “Management” on p.XX for more about managing teams.

Prerequisites

Anybody can create roadmaps, but creating effective, lightweight roadmaps requires Agile governance and a willingness to allow teams to own their work, as discussed in chapter “Invest in Agility”.

Indicators

When you use roadmaps well:

  • Managers and stakeholders understand what the team is working on and why.

  • The team isn’t prevented from adapting their plans.

Alternatives and Experiments

There are many ways of presenting roadmaps, and I haven’t gone into details about specific presentation styles. Experiment freely! The most common approach I see is short slide decks, but people also create videos (particularly for “just the facts” roadmaps), maintain wiki pages, and send status update emails. Talk with your stakeholders about what works for them.

As you experiment, look for ways to improve your adaptability and make fewer predictions. Over time, stakeholders will gain trust in your team, so be sure to revisit their expectations. You may discover that previously set-in-stone requirements are no longer important.

Further Reading

XXX Johanna Rothman? Pat Reed?

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.