AoAD2 Practice: Test-Driven Development

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Test-Driven Development

Audience
Programmers

We produce high-quality code in small, verifiable steps.

“What programming languages really need is a ‘DWIM’ instruction,” the joke goes. “Do what I mean, not what I say.”

Programming is demanding. It requires perfection, consistently, for months and years of effort. At best, mistakes lead to code that won’t compile. At worst, they lead to bugs that lie in wait and pounce at the moment that does the most damage.

People aren’t so good at perfection. No wonder, then, that software is buggy.

Wouldn’t it be wonderful if there were a tool that alerted you to programming mistakes moments after you made them—a tool so powerful, it virtually eliminated the need for debugging?

There is such a tool, or rather, a technique. It’s test-driven development, and it actually delivers these results.

Test-driven development, or TDD, is a rapid cycle of testing, coding, and refactoring. When adding a feature, a programmer may perform dozens of these cycles, implementing and refining the software in tiny steps until there is nothing left to add and nothing left to take away. Done well, TDD eliminates an entire class of programming errors. It doesn’t prevent all bugs, but it does ensure that the code does exactly what the programmer meant it to do.

When used properly, TDD also helps you improve your design, documents the behavior of your code, enables refactoring, and guards against future mistakes. Better yet, it’s fun. You’re always in control and you get this constant reinforcement that you’re on the right track.

TDD isn’t perfect, of course. TDD is difficult to add to legacy codebases. It takes extra effort to apply to code that involves the outside world, such as user interfaces, networking, and databases. It can take a few months of steady use to overcome the initial learning curve.

Try it anyway. Although TDD benefits from other Agile practices, it doesn’t require them. You can use it with almost any code.

Why TDD Works

Back in the days of punch cards, programmers laboriously hand-checked their code to make sure it would compile. A compile error could lead to failed batch jobs and intense debugging sessions to look for the misplaced character.

Getting code to compile isn’t such a big deal anymore. Most IDEs check your syntax as you type, and some even compile every time you save. The feedback loop is so fast that errors are easy to find and fix. (See “Key Idea: Fast Feedback” on p.XX.) If something doesn’t compile, there isn’t much code to check.

Test-driven development applies the same principle to programmers’ intention. Just as modern environments provide feedback on the syntax of your code, TDD cranks up the feedback on the semantics of your code. Every few minutes—as often as every 20 or 30 seconds—TDD verifies that the code does what you think it should do. If something goes wrong, there are only a few lines of code to check. Mistakes become obvious.

TDD is a series of validated hypotheses.

TDD accomplishes this trick through a series of validated hypothesis. You work in very small steps, and at every step, you make a mental prediction about what’s going to happen next. First you write a bit of test code and predict it will fail in a particular way. Then a bit of production code and predict the test will now pass. Then a small refactoring and predict the tests will pass again. If a prediction is ever wrong, you stop and figure it out—or just back up and try again.

As you go, the tests and production code mesh together to check each other’s correctness, and your successful predictions confirm that you’re in control of your work. The result is code that does exactly what you thought it should. You can still forget something, or misunderstand what needs to be done. But you can have confidence that the code does what you intended.

When you’re done, the tests remain. They’re committed with the rest of the code, and they act as living documentation of how you intended the code to behave. More importantly, your team runs the tests with (nearly) every build, ensuring that the code continues to work as originally intended. If someone accidentally changes the code’s behavior—for example, with a misguided refactoring—the tests fail, signaling the mistake.

How to Use TDD

You’ll need a programmer’s testing framework to use TDD. For historical reasons, they’re called “unit testing frameworks,” although they’re useful for all sorts of tests. Every popular language has one, or even multiple—just do a web search for “<language> unit test framework.” Popular examples include JUnit for Java, xUnit.net for .NET, Mocha for JavaScript, and GoogleTest for C++.

TDD doesn’t prevent mistakes; it reveals them.

TDD follows the “red, green, refactor” cycle illustrated in figure “The TDD Cycle”. Other than time spent thinking, each step should be incredibly small, providing you with feedback within a minute or two. Counterintuitively, the better at TDD someone is, the better they are at taking smaller steps, and the faster they go. This is because TDD doesn’t prevent mistakes; it reveals them. Small steps means fast feedback, and fast feedback means mistakes are easier and faster to fix.

A chart showing four steps: “Think,” followed by “Red bar,” followed by “Green bar,” followed by “Refactor.” There’s a loop from “Refactor” back to “Green bar,” and another loop from “Refactor” back to “Think.”

Figure 1. The TDD cycle

Step 1: Think

TDD is “test-driven” because you start with a test, and then only write enough code to make the test pass. The saying is, “Don’t write any production code unless you have a failing test.”

Your first step, therefore, is to engage in a rather odd thought process. Imagine what behavior you want your code to have, then think of the very first piece of that to implement. It should be small. Very small. Less than five lines of code small.

Next, think of a test—also just a few lines of code—that will fail until exactly that code is present. Think of something that tests the code’s behavior, not its implementation. As long as the interface doesn’t change, you should be able to change the implementation at any time, without having to change the test.

Allies
Pair Programming
Mob Programming
Spike Solutions

This is the hardest part of TDD, because it requires thinking two steps ahead: first, what you want to do; second, what test will require you to do it. Pairing and mobbing help. While the driver works on making the current test pass, the navigator thinks ahead, figuring out which increment and test should come next.

Sometimes, you won’t understand your code well enough to test-drive it. Thinking two steps ahead will be too difficult. When that happens, use a spike solution to figure out how to approach the problem, then rebuild it using TDD.

Step 2: Red bar

Once you know your next step, write the test. Write just enough test code for the current increment of behavior—hopefully fewer than five lines of code. If it takes more, that’s okay; just try for a smaller increment next time.

Write the test in terms of the code’s public interface, not how you plan to implement its internals. Respect encapsulation. The first time you test a class, module, method, or function, that means your test will use names that don’t exist yet. This is intentional: it forces you to design your interface from the perspective of a user of that interface, not as its implementer.

Ally
Zero Friction

After the test is coded, predict what will happen. Typically, the test should fail, resulting in a red progress bar in most test runners. Don’t just predict that it will fail, though; predict how it will fail. Remember, TDD is a series of validated hypothesis, and this is your first hypothesis.

Then use your watch script or IDE to run the tests. You should get feedback within a few seconds. Compare the result to your prediction. Did they match?

If the test doesn’t fail, or if it fails in a different way than you expected, you’re no longer in control of your code. Perhaps your test is broken, or it doesn’t test what you thought it did. Troubleshoot the problem. You should always be able to predict what’s going to happen.

Your goal is to always know what the code is doing and why.

It’s just as important to troubleshoot unexpected successes as it is to troubleshoot unexpected failures. Your goal isn’t merely to have tests that pass; it’s to remain in control of your code—to always know what the code is doing and why.

Step 3: Green bar

Next, write just enough production code to get the test to pass. Again, you should usually need less than five lines of code. Don’t worry about design purity or conceptual elegance; just do what you need to do to make the test pass. This is okay because you’ll be refactoring in a moment.

Make another prediction and run the tests. This is your second hypothesis.

The tests should pass, resulting in a green progress bar. If the test fails, get back to known-good code as quickly as you can. Often, the mistake will be obvious. After all, you’ve only written a few new lines.

If the mistake isn’t obvious, consider undoing your change and trying again. Sometimes it’s best to delete or comment out the new test and start over with a smaller increment. Remaining in control is key.

It’s always tempting to beat your head against the problem rather than backing up and trying again. Despite a few decades of experience, I do it too. And yet, that same hard-won experience have taught me that trying again with a smaller increment is almost always faster and easier.

That doesn’t stop me from beating my head against walls—it always feels like the solution is just around the corner—but I have finally learned to set a timer so the damage is contained. If you can’t bring yourself to undo right away, set a five or ten-minute timer, and promise yourself that you’ll back up and try again, with a smaller increment, when the timer goes off.

Step 4: Refactor
Ally
Refactoring

Once your tests are passing again, you can now refactor without worrying about breaking anything. Review the code you have so far and look for possible improvements. Ask your navigator if they have any suggestions.

Incrementally refactor to make each improvement. Use very small refactorings—less than a minute or two each, certainly not longer than five minutes—and run the tests after each one. They should always pass. As before, if the test doesn’t pass and the mistake isn’t immediately obvious, undo the refactoring and get back to known-good code.

Ally
Simple Design

Refactor as much as you like. Make your design as good as you can, but limit your changes to code related to your current task. Keep the design focused on the software’s current needs, not what might happen in the future.

While you refactor, don’t add any functionality. Refactoring isn’t supposed to change behavior. New behavior requires a failing test.

Step 5: Repeat

When you’re ready to add new behavior, start the cycle over again.

If things are going smoothly, with every hypothesis matching reality, you can “upshift” and take bigger steps. (But generally not more than five lines of code at a time.) If you’re running into problems, “downshift” and take smaller steps.

The key to TDD is small increments and fast feedback.

The key to success with TDD is small increments and fast feedback. Every minute or two, you should get a confirmation that you’re on the right track and your changes did what you expected them to do. Typically, you’ll run through several cycles very quickly, then spend more time thinking and refactoring for a few cycles, then speed up again.

Eat the Onion From the Inside Out

The hardest part about TDD is figuring out how to take small steps. Luckily, coding challenges are like ogres, and onions: they have layers. The trick with TDD is to start with the sweet, juicy core, and then work your way out from there. You can use any strategy you like, but this is the approach I typically use:

  1. Core interface. Start by defining the core interface that you want to call, and write a test that calls that interface in the simplest possible way. Use this as an opportunity to see how the interface works in practice. Is it comfortable? Does it make sense? To make the test pass, you can just hard-code the answer.

  2. Calculations and branches. Your hard-coded answer isn’t enough. What calculations and logic are at the core of your new code? Start adding them, one branch and calculation at a time. Focus on the happy path: how the code will be used when everything’s working properly.

  3. Loops and generalization. Your code will often involve loops or alternative ways of being used. Once the core logic has been implemented, add support for those alternatives, one at a time. You’ll often need to refactor the logic you’ve built into a more generic form to keep the code clean.

  4. Special cases and error handling. After you’ve handled all the happy-path cases, think about everything that can go wrong. Do you call any code that could throw an exception? Do you make any assumptions that need to be validated? Write tests for each one.

  5. Runtime assertions. As you work, you might identify situations that can only arise as the result of a programming error, such as an array index that’s out of bounds, or a variable that should never be null. Add run-time assertions for these cases so they fail fast. (See “Fail Fast” on p.XX.) Add these assertions as soon as you see the opportunity, not just at the end. They don’t need to be tested, since they’re just an added safety net.

James Grenning uses the mnemonic “ZOMBIES” to express the same idea: Test Zero, then One, then Many. While you test, pay attention to Boundaries, Interfaces, and Exceptions, all while keeping the code Simple. [Grenning 2016]

A TDD Example

TDD is best understood by watching somebody do it. I have several video series online demonstrating real-world TDD. At the time of this writing, my free “TDD Lunch & Learn” series is the most recent. It has 21 episodes covering everything from TDD basics all the way up to thorny problems such as networking and timeouts. [Shore 2020]

I’ll describe the first of these examples here. It uses TDD to create a ROT-13 encoding function. (ROT-13 is a simple Caesar cipher where “abc” becomes “nop”, and vice versa.) It’s a very simple problem, but it’s a good example of how even small problems can be broken down into very small steps.

In this example, notice the techniques I use to work in small increments. They may even seem ridiculously small, but this makes finding mistakes easy, and that helps me go faster. As I’ve said, the more experience you have with TDD, the smaller the steps you’re able to take, and the faster that allows you to go.

Start with the core interface

Think. First, I needed to decide how to start. As usual, the core interface is a good starting point. What did I want it look like?

This example was written in JavaScript—specifically, Node.js—so I had the choice between creating a class or just exporting a function from a module. There didn’t seem to be much value in making a full-blown class, so I decided to just make a rot13 module that exported a transform function.

Red bar. Now that I knew what I wanted to do, I was able to write a test that exercised that interface in the simplest possible way:

it("runs tests", function() {
  assert.equal(rot13.transform(""), "");
});

Before running the test, I made a hypothesis. Specifically, I predicted that the test would fail because rot13 didn’t exist, and that’s what happened.

Green bar. To make the test pass, I created the interface and hardcoded just enough to satisfy the test.

export function transform() {
  return "";
}

Hardcoding the return value is kind of a party trick, and I’ll often write a bit of real code during this first step, but in this case, there wasn’t anything else the code needed to do.

Good test names give you an overview of how the code is intended to work.

Refactor. Check for opportunities to refactor every time through the loop. In this case, I renamed the test from “runs tests,” which was leftover from my initial setup, to “does nothing when input is empty.” That’s obviously more helpful for future readers. Good tests document how the code is intended to work, and good test names allow the reader to get a high-level understanding by skimming through the names. Note how the name talks about what the production code does, not what the test does.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});
Calculations and branches

Think. Now I needed to code the core logic of the ROT-13 transform. Eventually, I knew I wanted to loop through the string and convert one character at a time, but that was too big of a step. I needed to think of something smaller.

A smaller step is obviously to “convert one character,” but even that was too big. Remember, the smaller the steps, the faster you’re able to go. I needed to think of just the first part of that. Ultimately, I decided to just transform one lowercase letter forward thirteen letters. Uppercase letters and looping backwards after “z” would wait for later.

Red bar. With such a small step, the test was easy to write:

it("transforms lower-case letters", function() {
  assert.equals(rot13.transform("a"), "n");
});

My hypothesis was that the test would fail, expecting "n" but getting "", and that’s what happened.

Green bar. Making the test pass was just as easy:

export function transform(input) {
  if (input === "") return "";

  const charCode = input.charCodeAt(0);
  charCode += 13;
  return String.fromCharCode(charCode);
}

Even though this was a small step, it forced me to work out the critical question of converting letters to character codes and back, something that I had to look up. Taking a small step allowed me to solve this problem in isolation, which made it easier to tell when I got it right.

Refactor. I didn’t see any opportunities to refactor, so it was time to go around the loop again.

Repeat. I continued in this way, step by small step, until the core letter transformation algorithm was complete.

  1. Lower-case letter forward: an (as I just showed)

  2. Lower-case letter backward: na

  3. First character before a doesn’t rotate: ``

  4. First character after z doesn’t rotate: {{

  5. Upper-case letters forward: AN

  6. Upper-case letters backward: NA

  7. More boundary cases: @@ and [[

After each step, I considered the code and refactored when appropriate. Here are the resulting tests. The numbers correspond to each step. Note how some steps resulted in new tests, and others just enhanced an existing test.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});

it("transforms lower-case letters", function() {
  assert.equal(rot13.transform("a"), "n"); ①
  assert.equal(rot13.transform("n"), "a"); ②
});

it("transforms upper-case letters", function() {
  assert.equal(rot13.transform("A"), "N");  ⑤
  assert.equal(rot13.transform("N"), "A");  ⑥
});

it("doesn't transform symbols", function() {
  assert.equal(rot13.transform("`"), "`"); ③
  assert.equal(rot13.transform("{"), "{"); ④
  assert.equal(rot13.transform("@"), "@");  ⑦
  assert.equal(rot13.transform("["), "[");  ⑦
});

Here’s the production code. It’s harder to match each step to the code because there was so much refactoring (see the video for details), but you can see how TDD is an iterative process that gradually causes the code to grow:

export function transform() {
  if (input === "") return "";

  let charCode = input.charCodeAt(0);                                    ①
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {    ③④⑤
    charCode += 13;                                                      ①
  }
  if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {   ② ④ ⑥
    charCode -= 13;                                                       ②
  }
  return String.fromCharCode(charCode);                                  ①
}

function isBetween(charCode, firstLetter, lastLetter) {                      ④
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);④
}                                                                            ④

function codeFor(letter) {                                                 ③
  return letter.charCodeAt(0);                                             ③
}                                                                          ③

The last step (more boundary cases) didn’t result in new production code, but I included it just to make sure I hadn’t made any mistakes.

Loops and generalization

Think. So far, the code only handled strings with one letter. Now it was time to generalize it to support full strings.

Refactor. I realized that this would be easier if I factored out the core logic, so I jumped back to the “Refactoring” step to do so.

export function transform(input) {
  if (input === "") return "";

  let charCode = input.charCodeAt(0);
  return transformLetter(charCode);
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  }
  if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween...
function codeFor...

Refactoring to make the next step easier is a technique I use all the time. Sometimes, during the “Red bar” step, I realize that I should have refactored first. When that happens, I comment out the test temporarily so I can refactor while my tests are passing. This makes it faster and easier for me to detect refactoring errors.

Red bar. Now I was ready to generalize the code. I updated one of my tests to prove a loop was needed:

it("transforms lower-case letters", function() {
  assert.equal(rot13.transform("abc"), "nop");
  assert.equal(rot13.transform("n"), "a");
});

I expected it to fail, expecting "nop" and getting "n", because it was only looking at the first letter, and that’s exactly what happened.

Green bar. I modified the production code to add the loop:

export function transform(input) {
  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter...
function isBetween...
function codeFor...
Ally
Zero Friction

Refactor. I decided to flesh out the tests so they’d work better as documentation. This wasn’t strictly necessary, but I thought it would make the ROT-13 logic more obvious. I changed one assertion at a time, of course. The feedback was so fast and frictionless, executing automatically every time I saved, there was no reason not to.

In this case, everything worked as expected, but if something had failed, changing one assertion at a time would have made debugging it just a little bit easier. Those benefits add up.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});

it("transforms lower-case letters", function() {
  assert.equal(
    rot13.transform("abcdefghijklmnopqrstuvwxyz"), "nopqrstuvwxyzabcdefghijklm" ①
  );
});

it("transforms upper-case letters", function() {
  assert.equal(
    rot13.transform("ABCDEFGHIJKLMNOPQRSTUVWXYZ"), "NOPQRSTUVWXYZABCDEFGHIJKLM" ②
  );
});

it("doesn't transform symbols", function() {
  assert.equal(rot13.transform("`{@["), "`{@[");                                ③
});
Special cases, error handling, and runtime assertions

Finally, I wanted to look at everything that could go wrong. I started with the runtime assertions. How could the code be used incorrectly? Usually, I don’t test my runtime assertions, but I did so this time for the purpose of demonstration:

it("fails fast when no parameter provided", function() {         ①
  assert.throws(                                                 ①
    () => rot13.transform(),                                     ①
    "Expected string parameter"                                  ①
  );                                                             ①
});                                                              ①

it("fails fast when wrong parameter type provided", function() { ②
  assert.throws(                                                 ②
    () => rot13.transform(123),                                  ②
    "Expected string parameter"                                  ②
  );                                                             ②
});                                                              ②

Of course, I followed the TDD loop and added the tests one at a time. Implementing them meant adding a guard clause, which I also implemented incrementally:

export function transform(input) {
  if (input === undefined ①  || typeof input !== "string" ②  ) {
    throw new Error("Expected string parameter");                 ①
  }                                                               ①
  ...

Good tests also act as documentation, so my last step is always to review the tests and think about how well they communicate to future readers. Typically, I’ll start with the general, “happy path” case, then go into specifics and special cases. Sometimes I’ll add a few tests just to clarify behavior, even if I don’t have to change the production code. That was the case with this code. These are the tests I ended up with:

it("does nothing when input is empty", ...);
it("transforms lower-case letters", ...);
it("transforms upper-case letters", ...);
it("doesn’t transform symbols", ...);
it("doesn’t transform numbers", ...);
it("doesn’t transform non-English letters", ...);
it("doesn’t break when given emojis", ...);
it("fails fast when no parameter provided", ...);
it("fails fast when wrong parameter type provided", ...);

And the final production code:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) &amp;&amp; charCode <= codeFor(lastLetter);
}

function codeFor(letter) {
  return letter.charCodeAt(letter);
}

At this point, the code did everything it needed to. Readers familiar with JavaScript, however, will notice that the code can be further refactored and improved. I continue the example in “Refactoring in Action” on p.XX.

Fast and Reliable Tests

Teams that embrace TDD accumulate thousands of tests. The more tests you have, the more important speed and reliability become.

Your tests must be fast, and they must produce the same answer every time.

In TDD, you run the tests as often as one or two times every minute. They must be fast, and they must produce the same answer every time. If they don’t, you won’t be able to get feedback within 1-5 seconds, and that’s crucial for the TDD loop to work effectively. You’ll stop running the tests as frequently, which means you won’t catch errors as quickly, which will slow you down.

Ally
Zero Friction

You can work around the problem by programming your watch script to only run a subset of tests, but eventually, slow and flaky tests will start causing problems during integration, too. Instead of deploy providing feedback within five minutes, it will take tens of minutes, or even hours. To add insult to injury, the tests will fail randomly, requiring you to start the long process all over again, adding friction and causing people to ignore genuine failures.

It is possible to write fast and reliable tests. It takes practice and good design, but once you know how, writing fast, reliable tests is faster and easier than writing slow, flaky tests. Here’s how:

Rely on narrow unit tests

Broad tests1 are written to cover large parts of the software: for example, they might launch a web browser, navigate to a URL, click buttons and enter data, then check that the browser shows the expected result.

1The terms “broad,” “narrow,” “sociable,” and “solitary” come from Jay Fields. XXX find original reference

Although this can be easy and seems like a good way to get good test coverage, it’s a trap. Broad tests are slow and unreliable. You need your build to run hundreds or thousands of tests per second, and to do so with perfect reliability. The way to do so is narrow tests.

A narrow test is focused on a small amount of code. Usually a method or function, or several, in a particular class or module. Sometimes, a narrow test will focus on a small cross-cutting behavior that involves several modules.

The best types of narrow tests are called unit tests in the Agile community, although there’s some disagreement over the exact definition, as “Other Unit Test Definitions” on p.XX discusses. I use Michael Feathers’ definition: a unit test is a narrow test that runs entirely in memory, without involving the outside world. In other words, it can’t touch a file system, communicate across a network, or talk to a database, and you don’t need to do special things to your environment (such as editing a configuration file) to run it. [Feathers 2004]

The vast majority of your tests should be unit tests. Because they run entirely in memory, they’re fast and reliable. The size of your unit test code should be proportional to the size of your production code. The ratios vary, but it will often be close to 1:1.

Creating narrow unit tests requires good design. If you have trouble writing unit tests, it could be a sign of problems in your design. Look for ways to decouple your code so that each class or module can be tested independently. Consider asking a mentor for help, and see “Simple Design” on p.XX for ideas.

Test outside interactions with narrow integration tests

Unit tests only test code that’s in memory, but your software doesn’t operate entirely in memory. It also has to talk to the outside world. To test code that does so, use narrow integration tests, also known as focused integration tests.

Narrow integration tests are just like unit tests, with the exception that they involve the outside world. Create them using TDD in the same way that you create a unit test.

Because they involve the outside world, narrow integration tests are much slower than narrow unit tests. Where unit tests can run at a rate of hundreds or thousands per second, narrow integration tests typically run at a rate of dozens per second.

Design your code to minimize the number of narrow integration tests you need. For example, if your code depends on a third-party service, don’t call the service directly from the code that needs it. Instead, create an infrastructure wrapper: an adapter that encapsulates the service and its network calls. Use the infrastructure wrapper in the rest of your code. “Third-Party Components” on p.XX has more about adapters and the “Application Infrastructure” episode of [Shore 2020] has an example.

You should end up with a relatively small number of narrow integration tests, proportional to the number of external systems your code interacts with.

Control global state

Any tests that deal with global state need careful thought. That includes global variables, such as static (class) variables and singletons; external data stores and systems, such as file systems, databases, and services; and machine-specific state and functions, such as the system clock, locale, time zone, and random number generator.

Tests are often written to assume that global state will be set in a certain way. Most of the time, it will be. But once in a while, it isn’t, often due to a race condition, and the test fails for no apparent reason. When you run it again, the test passes. The result is a flaky test: a test that works most of the time, but occasionally fails randomly.

Flaky tests are insidious. Because re-running the test “fixes” the problem, people learn that the right way to deal with flaky tests is to just run them again. Once you’ve accumulated hundreds of flaky tests, your test suite requires multiple runs before it succeeds. By that time, fixing the problem takes a lot of work.

When you encounter a flaky test, fix it the same day.

When you encounter a flaky test, fix it the same day. Flaky tests are the result of poor design. The sooner you fix them, the less problems you’ll have in the future.

The design flaw at the root of flaky tests is allowing global state to pollute your code. Some global state, such as static variables and singletons, can be removed through careful design. Other sorts of global state, such as the system clock and external data, can’t be avoided, but it can be carefully controlled. Use an infrastructure wrapper to abstract it away from the rest of your codebase, and test-drive it with narrow integration tests.

For example, if your code needs to interact with the system clock—for example, to time out a request, or to get the current date—create a wrapper for the system clock and use that, rather than the actual system clock, in the rest of your code. The “No More Flaky Clock Tests” episode of [Shore 2020] has an example.

Write sociable tests

Tests can be solitary or sociable. A solitary test is programmed so that all dependencies of the code under test are replaced with special test code called a “test double,” also known as a “mock.” (Technically, a “mock” is a specific type of test double, but the terms are often used interchangeably.)

Solitary tests allow you to test that your code under test calls its dependencies, but they don’t allow you to test that the dependencies work the way your code expects them to. The test doesn’t actually run the dependencies; it runs the test double instead. So if you ever make a change to a dependency that breaks the expectations of any code that uses it, your tests will continue to pass, and you’ll have accidentally introduced a bug.

To prevent this problem, people who write solitary tests also write broad tests to make sure that everything works together correctly. This is duplicated effort, and those broad tests are often slow and flaky.

A better approach, in my opinion—although opinions are divided on this point—is to use sociable tests rather than solitary tests. A sociable test runs the code under test without replacing its dependencies. The code uses its actual dependencies when in runs, which means that the tests fail if the dependencies don’t work the way the code under test expects. Figure “Solitary and Sociable Tests” illustrates the difference.

A figure in two parts. Part A is labelled “Solitary tests.” It shows a series of relationships: “A” relies on “B,” which relies on “C.” Each of A, B, and C have a test, and each has a mock that the test uses. Circles show that A, B, and C are each tested, but X’s show that the relationship between A and B, and between B and C, is not tested. Part B of the figure is labelled “Sociable tests.” It shows the same tests and relationships as part A, but it doesn’t have any mocks. The figure uses circles to show that the test of A also tests A’s relationship with B, and the test of B also tests B’s relationship with C. As a result, there are no gaps that aren’t tested.

Figure 2. Solitary and sociable tests

The best unit tests—again, in my opinion—are narrow, sociable tests. They’re narrow in that the test is only testing the class or module under test. They’re sociable in that the code under test still calls its real dependencies. The result is fast tests that provide full confidence that your code works as expected, without requiring the overhead and waste of additional broad tests.

This does raise the question: how do you prevent sociable tests from talking to the outside world? A big part of the answer is to design your code to separate infrastructure and logic, as I’ll explain in a moment. The other part is the judicious use of test doubles in any code that uses infrastructure wrappers. My “Testing Without Mocks” article [Shore 2018] catalogs design patterns for doing so, and [Shore 2020] has extensive examples.

Separate infrastructure and logic

Code that is pure logic, with no dependencies on anything that involves the outside world is, by far, the easiest code to test. So, to make your tests faster and more reliable, separate your logic from your infrastructure. As it turns out, this is a good way to keep your design clean, too.

There are a variety of ways to keep infrastructure and logic separate. Alistair Cockburn’s “Hexagonal Architecture” [Cockburn 2008], Gary Bernstein’s “Functional Core, Imperative Shell“ [Bernstein 2012], and my “A-Frame Architecture” [Shore 2018] are all similar ways of tackling the problem.

Use broad tests only as a safety net
If you use TDD correctly, broad tests shouldn’t be needed.

If you use TDD, narrow unit tests, narrow integration tests, and sociable tests correctly, your code should be thoroughly covered. Broad tests shouldn’t be needed.

For safety, though, it’s okay to have a small number of smoke tests. Smoke tests are broad tests that confirm that your software doesn’t go up in flames when you run it. They’re not comprehensive—they only test your most common scenarios. Your narrow tests are for comprehensive testing.

Broad tests tend to be very slow, often requiring seconds per test, and are difficult to make reliable. You should only need a handful of them.

Ally
Root-Cause Analysis
No Bugs

If you didn’t build your software with TDD from the beginning, or if you’re not confident in your ability to use TDD correctly, it’s okay to have more broad tests for safety. But do treat them only as a safety net. If they ever catch an error that your narrow tests don’t, that’s a sign of a problem with your testing strategy. Figure out what went wrong, fix the missing test, and change your testing approach to prevent further gaps.

Eventually, you’ll have confidence in your test suite and can reduce the number of broad tests to a minimum.

Adding Tests to Existing Code

Sometimes you have to add tests to existing code. Either the code won’t have any tests at all, or it will have broad, flaky tests that need to be replaced.

There’s a chicken-and-egg problem with adding tests to code. Good tests—that is, narrow tests—need to poke into your code to set up dependencies and validate state. Unless your code was written with testability in mind—and non-TDD’d code almost never is—you won’t be able to write good tests.

So you need to refactor. The problem is, in a complex codebase, refactoring is dangerous. Side effects lurk behind every function. Twists of logic wait to trip you up. In short, if you refactor, you’re likely to break something without realizing it.

So you need tests. But to test, you need to refactor. But to refactor, you need tests. Etc., etc., argh.

To break the chicken-and-egg dilemma, you need confidence that your refactorings are safe: that they cannot change the behavior of the code. Luckily, modern IDEs have automated refactorings, and, depending on your language and IDE, they might be guaranteed to be safe. According to Arlo Belshee, the core six safe refactorings you need are Rename, Inline, Extract Method/Function, Introduce Local Variable, Introduce Parameter, and Introduce Field. His article, “The Core 6 Refactorings” [Belshee 2016], is well worth reading.

If you don’t have guaranteed-safe refactorings, you can also use characterization tests, also known as pinning tests or approval tests. Characterization tests are temporary, broad tests that are designed to exhaustively test every behavior of the code you’re changing. Llewellyn Falco’s “Approvals” testing framework, available on GitHub at https://github.com/approvals, is a powerful tool for creating these tests. Emily Bache’s video demonstration of the “Gilded Rose” kata [Bache 2018] is an excellent example of how to use approval tests to refactor unfamiliar code.

Once you have the ability to refactor safely, you can change the code to make it cleaner. Work in very small steps, focusing on Belshee’s core six refactorings, and running your tests after each step. Simplify and refine the code until one part of it is testable, then add narrow tests to that part. You may need to write solitary tests rather than sociable tests, to begin with.

Continue refining, improving, and testing until all the code you’re working on is covered by high-quality narrow tests. Once it is, you can delete the characterization tests and any other broad tests of that code.

Questions

Isn’t TDD wasteful?

I go faster with TDD than without it. With enough practice, I think you will too.

TDD is faster because programming doesn’t just involve typing at the keyboard. It also involves debugging, manually running the code, checking that a change worked, and so forth. Michael “GeePaw” Hill calls this activity GAK, for “Geek At Keyboard.” With TDD, you spend much less time GAKking around and more time doing fun programming work. You also spend less time studying code, because the tests act as documentation and inform you when you make mistakes. Even though tests take time to write, the net result is that you have more time for development, not less. GeePaw Hill’s video, “TDD & The Lump of Coding Fallacy” [Hill 2018], is an excellent and entertaining explanation of this phenomenon.

TDD does take time to learn and apply, especially to new UI technologies and existing code. It’s worth it, but it can take a few months before it’s a net positive.

What do I need to test when using TDD?

The saying is, “Test everything that can possibly break.” To determine if something could possibly break, I think, “Do I have confidence that I’m doing this correctly, and that nobody in the future will inadvertently break this code?”

I’ve learned through painful experience that I can break nearly everything, so I test nearly everything. The only exception is code without any logic, such as simple getters and setters, or a function that only calls another function.

You don’t need to test third-party code unless you have some reason to distrust it. But it is a good idea to wrap third-party code in an adapter that you control, and test that the adapter works the way you want it to. “Third-Party Components” on p.XX has more about these adapters.

How do I test private methods?

Start by testing public methods. As you refactor, some of that code will move into private methods, but it will still be covered by existing tests.

If your code is so complex that you need to test a private method directly, this is a good indication that you should refactor. You can move the private function into a separate module or method object, where it will be public, and test it directly.

How can I use TDD when developing a user interface?

TDD is particularly difficult with user interfaces because most UI frameworks weren’t designed with testability in mind. Many people compromise by writing a very thin, untested translation layer that only forwards UI calls to a presentation layer. They keep all their UI logic in the presentation layer and use TDD on that layer as normal.

There are tools that allow you to test a UI directly, by making HTTP calls (for web-based software) or by pressing buttons and simulating window events (for client-side software). Although they’re usually used for broad tests, my preference is to use them to write narrow integration tests of my UI translation layer.

Should we refactor our test code?

Absolutely. Tests have to be maintained, too. I’ve seen otherwise-fine codebases go off the rails because of brittle and fragile test suites.

That said, tests are a form of documentation and should generally read like a step-by-step recipe. Loops and logic should be moved into helper functions that make the underlying intent of the test easier to understand. Across each test, though, it’s okay to have some duplication if it makes the intent of the test more clear. Unlike production code, tests are read much more often than they’re modified.

Arlo Belshee uses the acronym “WET,” for “Write Explicit Tests,” as a guiding principle for test design. It’s in contrast with the DRY (Don’t Repeat Yourself) principle used for production code. His article on test design, “WET: When DRY Doesn’t Apply,” is superb. [Belshee 2016a]

Prerequisites

Although TDD is a very valuable tool, it does have a two- or three-month learning curve. It’s easy to apply to toy problems such as the ROT-13 example, but translating that experience to larger systems takes time. Legacy code, proper test isolation, and narrow integration tests are particularly difficult to master. On the other hand, the sooner you start using TDD, the sooner you’ll figure it out, so don’t let these challenges stop you.

Because TDD has a learning curve, be careful about adopting it without permission. Your organization could see the initial slowdown and reject TDD without proper consideration. Similarly, be cautious about being the only one to use TDD on your team. It’s best if everyone agrees to use it together, otherwise you’re likely to end up with other members of the team inadvertently breaking your tests and creating test-unfriendly code.

Once you do adopt TDD, don’t continue to ask permission to write tests. They’re a normal part of development. When sizing stories, include the time required for testing in your size considerations.

Ally
Zero Friction

Fast feedback is crucial for TDD to be successful. Make sure you can get feedback within 1-5 seconds, at least for the subset of tests you’re currently working on.

Finally, don’t let your tests become a straightjacket. If you can’t refactor your code without breaking a lot of tests, something is wrong. Often, it’s a result of overzealous use of test doubles. Ask a mentor for help.

Indicators

When you use TDD well:

  • You spend little time debugging.

  • You continue to make programming mistakes, but you find them in a matter of minutes and can fix them easily.

  • You have total confidence that the whole codebase does what programmers intended it to do.

  • You aggressively refactor at every opportunity, confident in the knowledge that the tests will catch any mistakes.

Alternatives and Experiments

TDD is at the heart of the Delivering practices. Without it, Delivering fluency will be difficult or even impossible to achieve.

A common misinterpretation of TDD, as “Test-Driven Debaclement” on p.XX illustrates, is to design your code first, write all the tests, and then write the production code. This approach is frustrating and slow, and it doesn’t allow you to learn as you go.

Another approach is to write tests after writing the production code. This is very difficult to do well: the code has to be designed for testability, and it’s hard to do so unless you write the tests first. It’s also tedious, with a constant temptation to wrap up and move on. In practice, I’ve yet to see after-the-fact tests come close to the detail and quality of tests created with TDD.

Even if these approaches do work for you, TDD isn’t just about testing. It’s really about using very small, continuously-validated hypotheses to confirm that you’re on the right track and produce high-quality code. With the exception of Kent Beck’s TCR, which I’ll discuss in a moment, I’m not aware of any alternatives to TDD that allow you to do so while also providing the documentation and safety of a good test suite.

Under the TDD banner, though, there are many, many experiments that you can conduct. TDD is one of those “moments to learn, lifetime to master” skills. One of the biggest opportunities for experimentation is between the “classicist” or “mockist” approach. In this book, I’ve shown how to apply the classicist approach. The mockist approach, spearheaded by Steve Freeman and Nat Pryce, is also worth investigating. Their book, Growing Object-Oriented Software, Guided by Tests, is well worth reading. [Freeman and Pryce 2010]

More recently, Kent Beck has been experimenting with an idea he calls TCR: test && commit || revert. It refers to a small script that automatically commits your code if the tests pass, and reverts it if the tests fail. Although TCR sacrifices the “red bar” step of TDD, which a lot of people like, it forces you to take very small steps. This gives you the same series of validated hypotheses that TDD does, and arguably makes them even smaller and more frequent. That’s one of the hardest and most important things to learn about TDD. TCR is worth trying as an exercise, if nothing else.

Further Reading

This book only scratches the surface of TDD. For more detail about the approach I recommend here, see my “Testing Without Mocks” article [Shore 2018] and accompanying “TDD Lunch and Learn” video series [Shore 2020].

Test-Driven Development: By Example [Beck 2002] is an excellent introduction to TDD by the person who invented it. If you liked the ROT-13 example, you’ll like the extended examples in this book. The TDD patterns in Part III are particularly good.

Working Effectively with Legacy Code [Feathers 2004] is a must-have for anybody working with legacy code.

XXX Jay Fields, Astels, Rainsberger, 1ed p302

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.