AoAD2 Practice: Continuous Integration

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Continuous Integration

Audience
Programmers, Operations

We keep our latest code ready to release.

Most software development efforts have a hidden delay between when the team says “we’re done” and when the software is actually ready to release. Sometimes that delay can stretch on for months. It’s the little things: getting everybody’s code to work together, writing a deploy script, pre-populating the database, and so forth.

When your customers are ready to release, you push a button and release.

Continuous integration is a better approach. Teams using continuous integration keep everyone’s code working together and ready to release. The ultimate goal of continuous integration is to make releasing a business decision, not a technical decision. When your on-site customers say it’s time to release, you push a button and release. No fuss, no muss.

Allies
Collective Code Ownership
Refactoring

Continuous integration is also essential for collective code ownership and refactoring. If everybody is making changes to the same code, they need a way to keep their changes in sync. Continuous integration is the best way to do so.

Continuous Integration is an Attitude, Not a Tool

One of the early adopters of continuous integration was ThoughtWorks, a software development outsourcing firm. They built a tool called “CruiseControl” to automatically run their continuous integration scripts. They called it a continuous integration (CI) server, also known as a CI/CD server or build server.

Since then, the popularity of these tools has exploded. They’re so popular, the tools have taken over from the actual practice. Today, many people think “continuous integration” means using a CI server.

Continuous integration is about much more than running a build.

It’s not true. CI servers only handle one small part of continuous integration: they build and merge code on cue. But continuous integration is about much more than running a build. Fundamentally, it’s about being able to release your team’s latest work whenever your on-site customers want. No tool can do that for you.

Achieving this goal requires three things:

1. Integrate many times per day

Integration means merging together all the code the team has written. Typically, that means merging everyone’s code into a common branch of your source code repository. That branch goes by a variety of names: “main,” “master,” and “trunk” are common. I use “integration,” because I like clear names, and that’s what the branch is for. But you can use whatever name you like.

Ally
Task Planning

Teams practicing continuous integration integrate as often as possible. This is the “continuous” part of continuous integration. People integrate every time they complete a task card, before and after every major refactoring, and any time they’re about to switch gears. The elapsed time can be anywhere from a few minutes to a few hours, depending on the work. The more often, the better. Some teams even integrate with every commit.

If you’ve ever experienced a painful multiday merge, integrating so often probably seems foolish. Why go through that pain?

The more often you integrate, the less painful it is.

The secret of continuous integration is that it actually reduces the risk of a bad merge. The more often you integrate, the less painful it is. More frequent integrations mean smaller merges, and smaller merges mean less chance of merge conflicts. Teams using continuous integration still have occasional merge conflicts, but they’re rare, and easily resolved.

2. Never break the integration branch

When was the last time you spent hours chasing down a bug in your code, only to find that it wasn’t your code at all, but an out-of-date configuration, or somebody else’s code? Conversely, when was the last time you spent hours blaming a problem on your configuration or somebody else’s code, only to find that it was your code all along?

The integration branch must always build and pass its tests.

To prevent these problems, the integration branch needs to be known-good. Without exception, it must always build and pass its tests.

Ally
Zero Friction
Test-Driven Development

This is actually easier than you might think. You’ll need an automated build with a good suite of tests, but once you have that, guaranteeing a known-good integration branch is just a matter of validating the merged code before promoting it to the integration branch. That way, if the build fails, the integration branch remains in its previous, known-good state.

3. Keep the integration branch ready to release

Every integration should get as close to a real release as possible. The goal is to make preparing for release such an ordinary occurrence that, when you actually do release, it’s a non-event. One team I worked with got to the point that they were releasing multiple times per week. They wrote a small mobile app with a big red button. When they were ready to release, they’d go to the local pub, order a round, and push the button.

Allies
Done Done
Build for Operation
Feature Toggles

This means that every story includes tasks to update the build and deployment scripts, when needed. Code changes are accompanied by tests. Code quality problems are addressed. Data migrations are scripted. Important but invisible stories such as logging and auditing are prioritized alongside their features. Incomplete work is hidden from users.

Don’t save the grunt work for the end. (See “Key Idea: Minimize Work in Progress” on p.XX.) Take care of it continuously, throughout development. From the very first day, focus on creating a walking skeleton that could be released, if it only had a bit more meat on its bones, and steadily add to it with every story and task.

The Many Flavors of Continuous Integration

Continuous integration is so popular, and so misunderstood, that people keep coming up with new ways of describing the underlying idea.

Remember, the core idea of continuous integration is to make releases a business decision, not a technical decision. In order of increasing rigor, here are the various flavors of that idea:

  • CI Server. A tool that automatically runs build scripts. Not continuous integration at all.

  • Trunk-based development. Integrate many times per day; never break the build.

  • Continuous integration. Trunk-based development + keep the integrated code ready to release.

  • Continuous delivery. Continuous integration + deploy to a staging or test environment after every integration.

  • Continuous deployment. Continuous delivery + deploy to production after every integration.

Continuous delivery is really just continuous integration applied to online systems. Although the term was invented by Jez Humble in 2010, Kent Beck described it as part of continuous integration way back in 2004:

Integrate and build a complete product. If the goal is to burn a CD, burn a CD. If the goal is to deploy a web site, deploy a web site, even if it is to a test environment. Continuous integration should be complete enough that the eventual first deployment of the system is no big deal. [Beck 2004] (p. 50)

Extreme Programming Explained, 2nd ed.

Ally
Continuous Deployment

Continuous deployment, on the other hand, is a more advanced practice. I discuss it in chapter “DevOps”.

The Continuous Integration Dance

When you use continuous integration, every day follows a little choreographed dance:

  1. Sit down at a development workstation and reset it to a known-good state.

  2. Do work.

  3. Integrate (and possibly deploy) at every good opportunity.

  4. When you’re finished, clean up.

Ally
Zero Friction

These steps should all be automated as part of your zero-friction development environment.

For step 1, I make a script called reset_repo, or something similar. With git, the commands look like this (before error handling):

git fetch origin                     # get latest code from repo
git reset --hard origin/integration  # reset to integration branch
git checkout -b $PRIVATE_BRANCH      # create a private branch for your work
$BUILD_COMMAND_HERE                  # verify that you’re in a known-good state

During step 2, you'll work normally, including committing and rebasing however your team prefers.

Step 3 is to integrate. You can do so any time the tests are passing. Try to integrate at least every few hours. When you’re ready to integrate, you’ll merge the latest integration branch changes into your code, make sure everything works together, then tell your CI server to test your code and merge it back into the integration branch.

Your integrate script will automate these steps for you. With git, it looks like this (before error handling):

git status --porcelain         # check for uncommitted changes (fail if any)
git pull origin integration    # merge integration branch into local code
$BUILD_COMMAND_HERE            # build, run tests (to check for merge errors)
$CI_COMMAND_HERE               # tell CI server to test and merge code

The integration command varies according to your CI server, but will typically involve pushing your code to the repository. Be sure to set up your CI server to build and test your code before merging back to the integration branch, not after. That way your integration branch is always in a known-good state. If you don’t have a CI server that can do that, you can use the script in “Continuous Integration Without a CI Server” on p.XX instead.

Repeat steps 2 and 3 until you’re done for the day. After you integrate the final time, clean up. With git, that means erasing the private branch:

git branch -d $PRIVATE_BRANCH

These scripts are only suggestions, of course. Feel free to customize them to match your team’s preferences.

Synchronous vs. Asynchronous Integration

Allies
Zero Friction

Continuous integration works best when you wait for the integration to complete. This is called synchronous integration, and it requires your build and tests to be fast—preferably completing in less than five minutes, or ten minutes at most. Achieving this speed is usually a matter of improving the team’s test suite. “Fast and Reliable Tests” on p.XX describes how.

If the build takes too long, you’ll have to use asynchronous integration instead. In asynchronous integration, which requires a CI server, you start the integration process, then go do other work while the CI server runs the build. When the build is done, the CI server notifies you of the result.

Asynchronous integration sounds efficient, but it turns out to be problematic in practice. You check in the code, start working on something else, and then half an hour (or more) later, you get a notification that the build failed. Now you have to interrupt your work and go fix the problem. In theory, anyway. More often, it gets set aside until later. You end up with a chunk of work that’s hours or even days out of date, with much more likelihood of merge conflicts.

It’s a particular problem with poorly-configured CI servers. Although your CI server should only promote code to the integration branch after the build succeeds, so the integration branch is known-good, some CI servers default to merging the code first, then running the build afterwards. If the code breaks the build, then everybody who pulls from the integration branch is blocked.

Combine that with asynchronous integration, and you end up with a situation where people unwittingly check in broken code and then don’t fix it because they assume somebody else broke the build. The situation compounds, with error building on error. I’ve seen teams whose builds remained broken for days on end.

It’s better to make it structurally impossible to not break the build by testing the build first. It’s better still to use synchronous integration. When you integrate, wait for the integration to succeed. If it doesn’t, fix the problem immediately.

Multistage Integration Builds

Some teams have sophisticated tests, measuring qualities such as performance, load, or stability, that simply cannot finish in under ten minutes. For these teams, multistage integration is a good idea.

A multistage integration consists of two separate builds. The normal build, or commit build, contains all the items necessary to demonstrate that the software works: compiling, linting, unit tests, narrow integration tests, and a handful of smoke tests. (See “Fast and Reliable Tests” on p.XX for details.) This build runs synchronously, as usual.

When the commit build succeeds, the integration is considered to be successful, and the code is promoted to the integration branch. Then a slower secondary build runs asynchronously. It contains the additional tests that don’t run in a normal build: performance tests, load tests, stability tests, and so forth. It can also include deploying the code to staging or production environments.

If the secondary build fails, everyone stops what they’re doing to fix the problem.

If the secondary build fails, the team is notified, and everyone stops what they’re doing to fix the problem. This ensures the team gets back to a known-good build quickly. However, failures in the secondary build should be rare. If they’re not, the commit build should be enhanced to detect those types of problems, so they can be fixed synchronously.

Although a multistage build can be a good idea for a mature codebase with sophisticated testing, most teams I encounter use multistage integration as a workaround for a slow test suite. In the long term, it’s better to improve the test suite instead.

In the short term, introducing a multistage integration can help you transition from asynchronous to synchronous integration. Put your fast tests in the commit build and your slow tests in the secondary build. But don’t stop there. Keep improving your tests, with the goal of eliminating the secondary build and running your integration synchronously.

Pull Requests and Code Reviews

Pull requests are too slow for continuous integration.

Pull requests aren’t a good fit for continuous integration. They’re too slow. Continuous integration works best when the time between integrations is very short—less than a few hours—and pull requests tend to take a day or two to approve. This makes merge conflicts much more likely, especially for teams using evolutionary design.

Allies
Pair Programming
Mob Programming

Instead, use pairing or mobbing to eliminate the need for code review. Alternatively, if you want to keep code reviews, you can conduct code reviews after integrating, rather than as a pre-integration gate.

Although pull requests don’t work well on teams using continuous integration, they can still work as a coordination mechanism between teams that don’t share ownership.

Questions

You said we should clean up at the end of the day, but what if I have unfinished work and can’t integrate?

Ally
Test-Driven Development

If you’re practicing test-driven development, you can integrate any time your tests are passing, which should be every few minutes. You shouldn’t ever be in a position where you can’t integrate. If you are, you’ve probably gotten stuck. It might be a good idea to delete the unfinished code and start fresh in the morning.

You don’t have to delete unfinished work, but if you’ve been integrating frequently, the loss of code will be minimal, and you’re likely to do a better job in the morning.

If we use synchronous integration, what should we do while waiting for the integration to complete?

Take a break. Get a cup of tea. Perform ergonomic stretches. Talk with your pair or mob about design, refactoring opportunities, or next steps. If your build is under ten minutes, you should have time to clear your head and consider the big picture without feeling like you’re wasting time.

We always seem to run into merge conflicts when we integrate. What are we doing wrong?

One cause of merge conflicts is infrequent integration. The less often you integrate, the more changes you have to merge. Try integrating more often.

Another possibility is that your changes are overlapping with other team members’ work. Try talking more about what you’re working on and coordinating more closely with the people that are working on related code. See “Making Collective Ownership Work” on p.XX for details.

The CI server (or integration machine) constantly fails the build. How can integrate more reliably?

You might have a problem with flaky tests (tests that fail intermittently). See “Fast and Reliable Tests” on p.XX for help.

If your tests are fine, you probably need to run more tests locally. Run a full build and test before merging the integration branch with your code. That will make sure your changes didn’t break anything. Then, after you merge, run another full build and test. That will make sure the merge didn’t break anything. At this point, the CI build should proceed without issue. If it doesn’t, it means your local configuration is different from the CI server configuration.

If you have frequent problems with mismatched configuration, you might need to put more work into reproducible builds. See “Reproducible Builds” on p.XX for details.

Prerequisites

Allies
Zero Friction
Pair Programming
Mob Programming
Test-Driven Development

Continuous integration works best with synchronous integration, which requires a zero-friction build that takes less than ten minutes to complete. Otherwise, you’ll have to use asynchronous integration or multistage integration.

Asynchronous and multistage integration require the use of a CI server, and that server has to be configured so that it validates the build before it promotes the changes to the integration branch. Otherwise, you’re likely to end up with compounding build errors.

Pull requests don’t work well with continuous integration, so another approach to code review is needed. Pairing or mobbing work best.

Continuous integration relies on a build and test suite that thoroughly tests your code, preferably with fast and reliable tests. Test-driven development using narrow, sociable tests is the best way to achieve this.

Indicators

When you integrate continuously:

  • Deploying and releasing is painless.

  • Your team experiences few integration conflicts and confusing integration bugs.

  • Team members can easily synchronize their work.

  • Your team can release whenever your on-site customers are ready.

  • You team can release with the push of a button.

Alternatives and Experiments

Ally
Collective Code Ownership
Reflective Design
Refactoring

Continuous integration is essential for teams using collective code ownership and evolutionary design. Without it, significant refactoring becomes impractical, because it causes too many merge conflicts. That prevents the team from using reflective design, which is necessary for long-term success.

The most common alternative to continuous integration is feature branches, which merge from the integration branch on a regular basis, but only integrate to the integration branch when each feature is done. Feature branches can do a good job of keeping the integration branch ready to release, but they’re still prone to merge conflicts. They don’t work well for teams using refactoring and reflective design.

Ally
Continuous Deployment

The experiments I’ve seen around continuous integration involve taking it to further extremes. Some teams integrate on every commit—every few minutes—or even every time the tests pass. The most popular experiment is continuous deployment, which has entered the mainstream, and is discussed later in this book.

Further Reading

Martin Fowler’s article, “Patterns for Managing Source Code Branches,” [Fowler 2020b] is an excellent resource for people interested in digging into the differences between feature branches, continuous integration, and other branching strategies.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Reflective Design

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Reflective Design

Audience
Programmers

Every day, our code is better than it was the day before.

Traditional approaches to design assume that, once coded, designs shouldn’t change. The Open-Closed Principle, a famous design guideline, illustrates this mindset: “Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification.”

Allies
Simple Design
Refactoring

But Agile teams create simple designs that don’t anticipate the future. They don’t have hooks. Instead, Agile teams have the ability to refactor their code and change its design. This creates the opportunity for an entirely different approach to design: one in which entities are not designed to be extended, but are designed to be modified instead.

I call this approach reflective design.

How Reflective Design Works

Reflective design is in contrast to traditional design, which I call “predictive design.” In predictive design, you predict what your software will need to do, based on your current requirements and best guess about how those requirements might change, then you create a design that cleanly supports all those needs.

Reflective design only cares about the change you’re making right now.

In contrast, reflective design doesn’t speculate about the future. It only cares about the change you’re making right now. When using reflective design, you analyze your existing code in the context of your software’s existing functionality, then figure out how you can improve the code to make it better based on what you’re currently working on.

  1. Look at the code you’re about to work on. If you’re not familiar with it, reverse-engineer its design. For complicated code, drawing diagrams, such as a class diagram or sequence diagram, can help.

  2. Identify flaws in the design. What’s hard to understand? What doesn’t work well? If you’ve worked with this code recently, what caused problems? What will get in your way as you work on your current task?

  3. Choose one thing to improve first. Think of a design change that will clean up the code and make your current task easier or better. If big design changes are needed, talk them over with your teammates.

  4. Incrementally refactor the code to reach the desired design. Pay attention to how well the design changes work in practice. If they don’t work as well as hoped, change direction.

  5. Repeat until your task is done and the code is as clean as you want to make it. At a minimum, it needs to be a tiny bit better than when you started.

Reflective Design in Practice

I once had to replace the login infrastructure for one of my websites. My old authentication provider, Persona, had been discontinued, so I needed to switch to a new authentication provider, Auth0. This was a big change that required a new sign-up flow.

Ally
Feature Toggles

Rather than planning out this whole change in advance, I used reflective design to take it step by step. I focused on my first story, which was to add a login page that used Auth0. It would be hidden by a feature toggle until the Auth0 change was done.

My first step was to reverse-engineer the design of the code. It had been several years since I had worked with this code, so it was like I had never seen it before. Fortunately, although the code was far from perfect, I had used simple design, so it was easy to understand. No method was longer than 20 lines of code, and most were less than ten. The largest file was 167 lines.

I started by looking at the existing login endpoints. I didn’t do a deep dive; I just looked at each file’s imports and traced the dependencies. The login endpoint depended on PersonaClient and SubscriberAccount. PersonaClient depended on HttpsRestClient, which was a wrapper for third-party code. SubscriberAccount depended on RecurlyClient, which in turn depended on HttpsRestClient.

These relationships are illustrated in figure “Authentication Design Analysis”. I didn’t actually make a class diagram at the time; I just opened the files in my editor. The relationships were simple enough that I could hold it all in my head.

A UML diagram showing three packages: “www,” “model,” and “persistence.” The “www” package has a class named “login,” the “model” package has a class named “SubscriberAccount,” and the “persistence” package has three classes, named “PersonaClient,” “RecurlyClient,” and “HttpsRestClient.” The diagram shows that “login” uses “PersonaClient” and “SubscriberAccount,” “SubscriberAccount” uses “RecurlyClient,” and both “PersonaClient” and “RecurlyClient” have a reference to “HttpsRestClient.”

Figure 1. Authentication design analysis

Next, I needed to identify flaws in the design. There were a lot. This was some of the earliest code I had written for the site, nearly four years prior, and I had learned a lot since then.

  • I didn’t separate my logic from my infrastructure. Instead, SubscriberAccount (logic) depended directly on RecurlyClient (infrastructure).

  • SubscriberAccount didn’t do anything substantial. Instead, a separate User class was responsible for user-related logic. The purpose of SubscriberAccount wasn’t clear.

  • None of the infrastructure classes (PersonaClient, RecurlyClient, and HttpsRestClient) had tests. When I first wrote them, I didn’t know how to write tests for them, so I had just tested them manually.

  • The login endpoint didn’t have tests, because the infrastructure classes weren’t written be testable. Login had a lot of complexity, because it also validated subscription status. The lack of tests was a risk.

Focus your efforts on what matters most.

There were a lot of things I could have changed, but part of the trick of reflective design is to focus your efforts on what matters most. Although the vestigal SubscriberAccount class and its dependency on RecurlyClient was a problem, fixing it wouldn’t make writing the login endpoint easier.

The core structure of having the login endpoint depend on PersonaClient also made sense. I decided that I’d implement a similar Auth0Client class for the Auth0 login endpoint.

Ally
Test-Driven Development

The lack of testability was clearly the biggest problem. I wanted my new login endpoint to have sociable tests. (See “Write Sociable Tests” on p.XX.) For that to happen, Auth0Client needed to be nullable [Shore 2018], and for that, I needed HttpsRestClient to be nullable. While I was at it, I wanted to add narrow integration tests to HttpsRestClient.

These changes weren’t everything I needed to do, but they were the obvious first step. Now I was ready to incrementally modify the code to get where I wanted to be:

  1. Added narrow integration tests to HttpsRestClient and cleaned up edge cases. (This took 3 hours.)

  2. Made HttpsRestClient nullable. (1 hour)

  3. Made RecurlyClient nullable. (1.25 hours)

  4. Made PersonaClient nullable. (0.75 hours)

  5. Modified HttpsRestClient to better support Auth0Client’s needs. (0.75 hours)

  6. Implemented Auth0Client. (2 hours)

Reflective design doesn’t always involve a big change. Once Auth0Client was implemented, my next task was to implement a feature toggle that would allow me to manually test the Auth0 login endpoint in production, but hide it from regular users.

Implementing the feature toggle was a much smaller task, but it followed the same reflective approach. First, I reviewed the SiteContext class that would contain the feature flag, and the AuthCookie class it depended upon. Second, I identified flaws: the design was fine, but the tests weren’t up to my current standards. Third, I decided how to improve: fix the tests. Fourth, refactor incrementally: I reordered the SiteContext tests to make them more clear, and migrated the AuthCookie tests from an old test framework to my current test framework.

All together, this was only about half an hour of work, so the steps weren’t really that distinct. It was more a matter of “look at the code, see a few obvious issues, fix the issues.” Reflective design isn’t necessarily a crisp sequence of steps. The important part is that, while you work, you’re constantly reflecting on your code’s design and making improvements.

Reverse-Engineering the Design

The first step in reflective design is to analyze your existing code and reverse-engineer its design.

The best approach is to ask somebody on the team to explain the design to you. A conversation around a whiteboard sketch, whether in-person or virtual, is a fast and effective way to learn, and it will often turn into a collaboration around possible improvements.

In some cases, no one on the team will understand the design, or you may wish to dive into the code yourself. When that happens, start by thinking about the responsibilities of the various files in the system. Choose the file whose responsibilities seem most closely related to your current task. If nothing else, you can often start with the UI, and trace the dependencies from there. For example, when I analyzed the authentication code, I started with the endpoint related to the login button.

If you’re new to the codebase, you might not understand how responsibilities correlate to files. It’s easier to get a rough idea than you might think. Starting with the root directory, look at the name of each source code directory and file. Based on those names, write a one-sentence guess about what the directory or file is responsible for. Don’t worry about getting it exactly right; you’re just skimming to get an overview. If you want to check your guess, you can skim through the method and function names in the file, but err on the side of making a rough guess.

Recursively analyze each directory, or at least the ones that seem most promising. Continue down through the file tree until you’ve got a rough understanding of the responsibilities of various parts of the system. For an example of this process, see the “How to Add a Feature (Cleanly)” episode of [Shore 2020b].

Once you have a starting point, open up the file and skim through the method and function names. Use them to confirm or revise your guess about the file’s responsibilities. If you need more clues, skim through the test names in the file’s tests. Then look at this file’s dependencies (typically, its imports). Analyze those files, too, and repeat until the dependencies are no longer relevant to the change you’re making.

Now that you have a good idea of the files involved and each of their responsibilities, go back through and see how they relate to each other. If it’s complicated, draw a diagram. You can use a formal modelling technique, such as UML, but an ad-hoc sketch is just as good. I usually start by drawing boxes for each module or class, and lines with labels to show how they relate to each other. When the code is particularly complicated, I’ll create a sequence diagram, which has a column for each module or class instance, and arrows between columns showing function calls.

Some tools will automatically create UML diagrams from your source code. I prefer to generate my diagrams manually, by studying the code myself. Creating it manually requires me to study the code more deeply. It takes longer, but I end up with a much better understanding of how the code works.

Allies
Pair Programming
Mob Programming

Before you go through this process, though, remember that the best way to understand the design is to ask somebody on the team to review it with you. Unless your team works with a lot of code they didn’t build, you should rarely have trouble finding someone who understanding the design of existing code. Your team wrote it, after all. A quick review to update your understanding should be enough.

Once you understand the design of the code, you’re ready to identify improvements.

Identifying Improvements

All code has an underlying beauty. That’s the most important thing to remember when looking for design improvements. It’s easy to look at existing code and think, “this is terrible.” And it may actually be terrible—although you should be careful not to assume a design is terrible just because you don’t understand the code at first. Code takes time to understand, no matter how well it’s designed.

But even if the code is terrible, it was most likely created with some underlying design in mind. That design might have gotten crufty over time, but somewhere underneath, there’s the seed of a good idea.

Your job is to find and appreciate the code’s underlying beauty.

Your job is to find and appreciate that underlying beauty. You don’t have to keep the original design, if it’s no longer appropriate, but you do need to understand it. Quite often, the original design still makes sense. It needs tweaks, not wholesale revision.

To return to the authentication example, the login endpoint depended on PersonaClient, which depended on HttpsRestClient. None of the code was testable, which resulted in some ugly, untested login code. But the core idea of creating infrastructure wrappers was sound. Rather than abandon that core idea, I amplified it by making the infrastructure wrappers nullable, which later allowed me to use test-driven development to make a new, cleaner Auth0 login endpoint.

That’s not to say that the existing design will be perfect. There’s always something to improve. But as you think about improvements, don’t look for ways to scrap everything and start over. Instead, look for problems that detract from the underlying beauty. Restore and improve the design. Don’t reinvent it.

Code Smells

Code smells are condensed nuggets of wisdom that help you identify design problems. They’re a great way to notice opportunities for improvement in your code.

Noticing a code smell doesn’t necessarily mean there’s a problem with the design. It’s like a funky smell in the kitchen: it could indicate that it’s time to take out the garbage, or it could just mean that someone’s been cooking with a particularly pungent cheese. Either way, when you smell something funny, take a closer look.

[Fowler 2018], writing with Kent Beck, has an excellent discussion of code smells. It’s well worth reading. The following smells summarize the ones I find most often, including ones that Fowler and Beck didn’t mention.1

1Code Class, Squashed Errors, Coddled Nulls, Time Dependency, and Half-Baked Objects are my invention. XXX Check 2nd ed. Refactoring to see if these are mentioned.

Shotgun Surgery and Divergent Change

These two smells help you identify cohesion problems in your code. Shotgun Surgery occurs when you have to modify multiple modules or classes to make a single change. It’s an indication that the concept you’re changing needs to be centralized. Give it a name and module of its own.

Divergent Change is just the opposite: it occurs when unrelated changes affect the same module or class. It’s an indication that the module has too many responsibilities. Split those responsibilities into multiple modules.

Primitive Obsession and Data Clumps

Primitive Obsession occurs when important design concepts are represented by generic types. For example, when currency is represented with a decimal, or a subscription renewal date is represented with a Date. This leads to code involving those concepts being spread around the codebase, increasing duplication and decreasing cohesion.

Data Clumps are similar. They occur when several variables always appear together, representing some concept, but they don’t have a class or type that represents them. For example, the code might consistently pass street1, street2, state, country, and postalCode strings to various functions or methods. They’re a data clump representing an address.

The solution is the same in both cases: encapsulate the concept in a dedicated type or class.

Data Class and Code Class

One of the most common object-oriented design mistakes I see is data and code that are in separate classes. This often leads to duplicate data-manipulation code. When you have a class that’s little more than instance variables combined with getters and setters, you have a Data Class. Similarly, when you have a class that’s just a container for functions, with no per-instance state, you have a Code Class.

Code Classes aren’t necessarily a problem on their own, but they’re often found alongside Data Class, Primitive Obsession, or Data Clumps. Reunite the code and its data: Improve cohesion by putting methods in same class as the data they operate upon.

Squashed Errors and Coddled Nulls

Robust error handling is one of the things that separates great programmers from the merely good. All too often, code that’s otherwise well-written will throw up its metaphorical hands when it encounters an error. A common construct is to catch exceptions, log an error, and then return null or some other meaningless value. It’s particularly common in Java, where exception handling is required by the compiler.

These Squashed Errors lead to problems in the future, because the null ends up being used as a real value somewhere else in the code. Instead, only handle errors when you’re able to provide a meaningful alternative, such as retrying or providing a useful default. Otherwise, propagate the error to your caller.

Coddled Nulls are a related issue. They occur when a function receives an unexpected null value, either as a parameter or as a return value from a function it calls. Knowing the null will cause a problem, but not knowing what to do with it, the programmer checks for null and then returns null themselves. The null cascades deep into the application, causing unpredictable failures later in the execution of the software. Sometimes the null makes it into the database, leading to recurring application failures.

Instead, adopt a fail fast strategy. (See “Fail Fast” on p.XX.) Don’t allow null as a parameter unless it has explicitly defined semantics. Don’t return null to indicate an error; throw an exception instead. When you receive null where it wasn’t expected, throw an exception.

Time Dependencies and Half-Baked Objects

Time Dependencies occur when a class’ methods must be called in a specific order. Half-Baked Objects are a special case of Time Dependency: they must be first be constructed, then initialized with a method call, then used.

Time Dependencies and Half-Baked Objects often indicate an encapsulation problem. Rather than managing its state itself, the class expects its callers to manage some of its state. This results in bugs and duplicate code in callers. Look for ways to encapsulate the class’ state more effectively. In some cases, you may find that your class has too many responsibilities and would benefit from being split into multiple classes.

Incrementally Refactor

Allies
Refactoring
Test-Driven Development

When you’ve decided what to change, make the change using a series of small refactorings. Work incrementally, one small step at a time, making sure the test pass after each step. Not counting time spent thinking, each refactoring should be a minute or two of work at most. Often less. Sometimes, you might need to add missing functions or methods; build those using test-driven development.

As you work, you’ll discover that some of your improvement ideas were, in fact, not good ideas. Keep your plans flexible. As you make each change, evaluate the result with reflective design as well. Commit your code frequently so you can revert ideas that don’t work out.

But don’t worry about making the code perfect. As long as you leave it better than you found it, that’s good enough for now.

Questions

How is reflective design different than refactoring?

Reflective design is deciding where to drive the car. Refactoring is pressing the pedals and moving the steering wheel.

How do we make time for reflective design?

It’s a normal, non-negotiable part of your work. You’re supposed to leave the code at least a little bit better than you found it, so when you start a task, start with reflective design to see what you’re going to improve. Sometimes, those improvements will actually decrease the time needed for the task. Even if it doesn’t make your task quicker now, it will make a future task faster. Keeping the design clean is a net win.

Ally
Slack

On the other hand, you only need to leave the code a little bit better than you found it. Don’t fix everything. Instead, use slack to decide when to make time for additional opportunities, as described in “Improving Internal Quality” on p.XX.

Prerequisites

Anybody can use reflective design to identify improvements. It’s another tool in the toolbelt, and there’s no problem using it alongside predictive or ad-hoc design approaches.

Allies
Refactoring
Test-Driven Development

Actually following through on the improvements requires refactoring, and that generally relies on a good suite of tests.

Indicators

When you use reflective design well:

  • Your team constantly improves the design of existing code.

  • When working on a task, programmers often refactor to make the task easier.

  • Refactorings are focused where they’ll do the most good.

  • The code steadily becomes easier and more convenient to work with.

Alternatives and Experiments

Teams that don’t know how to use reflective design often advocate for rewriting code instead, or taking a chunk of time to refactor. Although this works, it’s clumsy in comparison. It can’t be done incrementally, leading to conflicts between programmers, who worry their code is becoming unmaintainable, and stakeholders, who worry that there won’t be enough time for creating value.

Reflective design is really about incremental design improvements. It’s the same theme of incremental work that runs throughout the Delivering zone practices. You don’t need to use the exact approach described here, so feel free to experiment. As you do, focus on techniques that allow you identify improvements and make changes gradually, without “stopping the world” to make a change.

Further Reading

Episode nine of [Shore 2020b], “How to Add a Feature (Cleanly),” demonstrates reflective design on a small codebase.

XXX Suggestions?

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Incremental Design

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Incremental Design

Audience
Programmers

We design while we deliver.

Agile makes a challenging demand of its programmers: every week or two, programmers should finish four to ten customer-centric stories. Every week or two, customers may revise the current plan and introduce entirely new stories, with no advance notice. This regimen starts on the very first week.

As a programmer, you must be able to start and finish stories, from scratch, in a single week. No advance preparation is possible. You can’t set aside several weeks for establishing technical infrastructure. You’re expected to focus on delivering customer-valued stories instead.

This sounds like a recipe for disaster. Fortunately, incremental design allows you to build technical infrastructure incrementally, in small pieces, as you deliver stories.

It’s Not Just Coding

Computers don’t care what your code looks like. If the code compiles and runs, the computer is happy. Design is for humans: specifically, to allow programmers to easily understand and change the code. Design quality and development costs are joined at the hip: Code is well-designed when the costs of change are low.

Quality is highly situational, of course. The cost of change depends on the capabilities of the software, the capabilities of the programmers, and the specific changes being made.

Design is so important, we do it all the time.

The secret behind successful Delivering zone teams, therefore, is that they never stop designing. Delivering practices might seem to be about programming, at first glance, but most of them are about design. As Ron Jeffries used to say about Extreme Programming, design is so important, we do it all the time.

Allies
Pair Programming
Mob Programming
Ubiquitous Language
Test-Driven Development
Collective Code Ownership
Refactoring
Continuous Integration

Pairing and mobbing dedicate at least half of the programmers on your team to thinking about design. Ubiquitous language is about designing code to reflect domain experts’ thinking. Test-driven development encourages you to think about and improve your design at nearly every step. Collective code ownership expects people to improve the design. Refactoring makes it possible. Continuous integration allows people to make changes without stepping on each others’ toes.

Delivering teams constantly talk about design, especially when pairing and mobbing. In fact, that’s what nearly all of the conversations are about. Some of them are very detailed and nitpicky, such as “What should we name this method?” Others are much higher-level, such as, “These two modules share some responsibilities. We should split them apart and make a third module.”

Design discussions don’t have to be restricted to the person you’re currently working with. Have larger group discussions as often as you think is necessary, and use whatever modelling techniques you find helpful. (See “Drop in and Drop Out” on p.XX.) Try to keep them informal and collaborative. Simple whiteboard sketches work well.

How Incremental Design Works

Allies
Simple Design
Reflective Design

Incremental design is the driving force behind evolutionary design. It works in concert with simple design and reflective design:

  1. Start with the simplest design that could possibly work. (Simple design.)

  2. When the design doesn’t do everything you need right now, incrementally add to it. (Incremental design.)

  3. Every time you make a change, improve the design by reflecting on its strengths and weaknesses. (Reflective design.)

To be specific, when you first create a design element, whether it’s a new method, a new class, or even a new architecture, be completely specific. Create a simple design that solves exactly the problem you face at the moment, and nothing else, no matter how easy it may seem to solve more general problems.

This is difficult! Experienced programmers think in abstractions. In fact, the ability to think in abstractions is often the sign of a good programmer. Avoiding abstractions and coding for one specific scenario will seem strange, even unprofessional.

Do it anyway. Waiting to introduce abstractions will allow you to create designs that are simpler and more powerful.

The second time you add to a design element, modify the design to make it more general—but only general enough to solve the two problems it needs to solve. Next, review the design and make improvements. Simplify and clarify the code.

The third time you add to a design element, generalize it further—but again, just enough to solve the three problems at hand. A small tweak to the design is usually enough. It will be pretty general at this point. Again, review the design, simplify, and clarify.

Continue this pattern. By the fourth or fifth time you work with a design element—be it a method, a module, or something bigger—you’ll typically find that its abstraction is perfect for your needs. Best of all, because your design was the result of combining practical needs with continuous improvement, the design will be elegant and powerful.

Quality tends to improve in bursts. Typically, you’ll incrementally grow a design for several cycles, making minor changes as you go. Then something will give you an idea for a new design approach, which will require a series of more substantial refactorings to support it. Eric Evans calls this a breakthrough. [Evans 2003] (Chapter 8.) Breakthroughs happen at all levels of the design, from methods and functions to architectures.

Within a class or module
Allies
Test-Driven Development
Refactoring

If you’ve practiced test-driven development, you’ve practiced incremental design, at least at the level of a single module or class. You start with nothing and build a complete solution, layer by layer, making improvements as you go. As “A TDD Example” on p.XX shows, your code starts out completely specific, often to the point of hard-coding the answer, but then it gradually becomes more generic as additional tests are added.

Refactorings occur every few minutes, during the “Refactoring” step of the TDD cycle. Breakthroughs can happen several times per hour, and often take a matter of minutes to complete. For example, there’s a breakthrough at the end of “Refactoring in Action” on p.XX, when I realized that the regular expression allowed me to simplify the transformLetter() function. Notice how, up to that point, the refactorings resulted in small, steady improvements. After the breakthrough, though, transformLetter() became dramatically simpler.

Across classes and modules

When TDD is performed well, the design of individual modules and classes is beautiful: they’re simple, elegant, and convenient to use. This isn’t enough. Without attention to the interaction between modules and classes, the overall design will be muddy and confusing.

Allies
Pair Programming
Mob Programming

During TDD, navigators should also consider the wider scope. Ask yourself these questions: are there similarities between the code you’re implementing and other parts of the system? Are responsibilities clearly defined and concepts clearly represented? How well does the module or class you’re currently working on interact with other modules and classes?

When you see a problem, add it to your notes. During one of the refactoring steps of TDD—usually, when you’ve come to a good stopping place—bring up the issue, discuss solutions with your driver, and refactor. If you think your design change will significantly affect other members of the team, take a quick break to discuss it around a whiteboard.

Don’t let design discussions turn into long, drawn-out disagreements. Follow the ten-minute rule: if you disagree on a design direction for 10 minutes, try one and see how it works in practice. If you have a particularly strong disagreement, split up and try both as spike solutions. Nothing clarifies a design decision like working code.

Ally
Slack

Cross-module and cross-class refactorings happen several times per day. Depending on your design, breakthroughs may happen a few times per week and can take several hours to complete. (Nonetheless, remember to proceed in small steps.) Use your slack to complete breakthrough refactorings. In some cases, you won’t have time to finish all the refactorings you identify. That’s okay. As long as the design is better at the end of the week than it was at the beginning, you’re doing enough.

For example, when working on a small content management engine, I started by implementing a single Server class that served static files. When I added support for translating Jade templates to HTML, I started out by putting the code to do so in Server, because that was the simplest approach. It got ugly after I added support for dynamic endpoints, so I factored the template responsibilities into a JadeProcessor module.

That led to the breakthrough that static files and dynamic endpoints could similarly be factored into StaticProcessor and JavaScriptProcessor modules, and that they could all depend on the same underlying SiteFile class. That cleanly separated my networking, HTML generation, and file handling code.

Application architecture

“Architecture” is an overloaded word. Here, I’m referring to application architecture, by which I mean the recurring patterns in your code. Not formal patterns in the Design Patterns [Gamma et al. 1995] sense, but the repeated conventions throughout your codebase. For example, web applications are often implemented so that every endpoint has a route definition and controller class, and the controllers are often each implemented with a Transaction Script ([Fowler 2002], ch. 9).

Those recurring patterns embody your application architecture. Although they lead to consistent code, they’re also a form of duplication, which makes changes to your architecture more difficult. For example, changing a web application from using a Transaction Script approach to a Domain Model approach requires updating every single endpoint’s controller.

I’m focusing on application architecture here. To apply evolutionary design ideas to system architecture, see “Evolutionary Architecture” on p.XX.

Be conservative in introducing new architectural patterns. Introduce just what you need for the amount of code you have and the features you support at the moment. Before introducing a new convention, ask yourself if you really need the duplication. Maybe there’s a way to isolate the duplication to a single file, or to allow different parts of the system to use different approaches.

For example, in the content management engine I described previously, I could have started by coming up with a grand strategy for supporting different templating and markup languages. That was meant to be one of its distinguishing features, after all. But instead, I started by implementing a single Server class, and let the code grow into its architecture over time.

Even after I introduced classes for each type of markup, I didn’t try to make them follow a consistent pattern. Instead, I allowed them to each take their own unique approach—whichever was simplest in each case. Over time, some of those approaches worked better than others, and I gradually standardized my approach. Eventually, the standard was so stable, I converted it into a plug-in architecture. Now I can support a new markup language or template just by dropping a file in a directory.

Because architectural decisions are hard to change, it’s important to delay those commitments. (See “Key Idea: The Last Responsible Moment” on p.XX.) The plug-in architecture I mentioned happened years after the content management engine was first created. If necessary, I could have added plug-in support sooner, but I didn’t need to, so I took it slow. That allowed me to standardize on an approach that had a lot of experience and wisdom baked into it, and as a result, it hasn’t needed additional changes.

In my experience, breakthroughs in architecture happen every few months, although I expect this to vary widely by team. Refactoring to support the breakthrough can take several weeks or longer because of the amount of duplication involved. As with all breakthroughs, it’s only worth doing if it’s a significant-enough improvement to be worth the cost.

Although changes to your architecture may be tedious, they aren’t usually difficult once you’ve identified the new architectural pattern. Start by trying out the new pattern in just one part of your code. Let it sit for a while—a week or two—to make sure the change works well in practice. When you’re sure it does, bring the rest of the system into compliance with the new approach. Refactor each class or module you touch as you perform your everyday work, and use some of your slack to bring other classes and modules into compliance.

Keep delivering stories while you refactor. Although you could take a break from new development to refactor all at once, that would disenfranchise your on-site customers. Balance technical excellence with delivering value. Neither can take precedence over the other. This may lead to inconsistencies within the code during the changeover, but fortunately, that’s mostly an aesthetic problem—more annoying than problematic.

Introducing architectural patterns gradually, only as needed, helps reduce the need for architectural refactorings. It’s easier to expand an architecture than to simplify one that’s too ambitious.

Risk-Driven Refactoring

Architecture may seem too essential not to design up-front. Although some problems do appear to be too expensive to change incrementally, such as choice of programming language, I’ve found that many “architectural” concerns are actually easy to change if you eliminate duplication and embrace simplicity. Distributed processing, persistence, internationalization, security, and transaction structure are commonly considered so complex that you must consider them from the beginning. I disagree; I’ve dealt with them all incrementally. [Shore 2004a]

What do you do when you see a hard problem coming? For example, what if your stakeholders insist that you not spend any time on internationalization, but you know that it’s coming eventually, and—with your current design—it’s only going to get more expensive to support?

Your power lies in your ability to choose which refactorings to work on.

Your power lies in your ability to choose which refactorings to work on. No design is perfect. You will always have more opportunities to refactor than time to do it. And although it would be inappropriate to implement features your customers haven’t asked for, you can direct your refactoring efforts toward reducing risk. So choose refactorings that also reduce architectural risks.

To be specific, think about the problems you think you might face and refactor to improve the parts of your design that are related to those problems. For example, if your code had a lot of duplication in the way it formatted currency, localizing currencies would be expensive. You’d have to find every piece of code that dealt with currency formatting and fix it. A risk-driven solution would be to refactor your currency code so the concept had its own class, as described in “Once and Only Once” on p.XX, then move the currency formatting into that class, as shown in figure “Use Risk to Drive Refactoring”.

Two UML class diagrams. The first is labelled “Risk. Every class duplicates the currency rendering algorithm. If it is internationalized, changing it will be difficult and expensive.” It shows three UI classes, each with a “renderCurrency” method. A large arrow transitions to the second diagram, which is labelled “No Risk. The currency rendering algorithm is only implemented in the Currency class. If it is internationalized, only one method needs changing.” It shows the three UI classes depending on a Currency class, which has a single “render” method.

Figure 1. Use risk to drive refactoring

Limit your efforts to improving your existing design. For example, you wouldn’t actually internationalize the Currency class until you were working on a story that needed it. Once you’ve eliminated duplication around a concept, changing its implementation will be just as easy later as it is now.

A team I worked with replaced an entire database connection pooling library with our own hand-built version in half a pair-day. (This was in 2001, when the library ecosystem was much less mature. The library we replaced had some obscure thread safety bugs.)

Although we didn’t anticipate this need, it was still easy because we had previously eliminated all duplication around database connection management. There was just one method in the entire system that created, opened, and closed connections, which made test-driving our own connection pool manager almost trivially easy. Most of the time was spent on figuring out how to test-drive the thread safety code.

Another way to reduce architecture risk is to ask your customers to schedule stories that will allow you to work on the risky area. For example, to address the internationalization risk, you could create a story such as “Localize application for Spain” (or any country that has different localization rules than your own). This story has real customer value, but also addresses the risk.

Your customers have final say over story priorities, however, and their sense of risk and value may not match yours. Don’t feel too bad if that’s the case; you can still use refactorings to reduce architecture risk.

Questions

Isn’t incremental design more expensive than up-front design?

Just the opposite, actually, in my experience. There are two reasons for this. First, because incremental design only implements enough code to satisfy the current requirements, you start delivering features much more quickly with incremental design. Second, when a future story changes, you haven’t coded anything to support it, so you haven’t wasted any effort.

Even if requirements never changed, incremental design would still be more effective, because it leads to design breakthroughs on a regular basis. Each breakthrough allows you to see new possibilities and eventually leads to another breakthrough—sort of like walking through a hilly forest in which the top of each hill reveals a new, higher hill you couldn’t see before. This continual series of breakthroughs substantially improves your design.

Aren’t hill-climbing algorithms vulnerable to local maxima?

Luckily, incremental design isn’t an algorithm. It’s a technique performed by humans, who can use their creativity and ingenuity to break out of local maxima. That’s what a breakthrough is: climbing to the top of a hill, seeing across the valley, and realizing that there’s a better hill within flying distance.

Don’t breakthroughs result in wasted effort as you backtrack?

Sometimes a breakthrough will lead you to see a completely new way of approaching your design. When this happens, refactoring may seem like backtracking, especially if the refactoring simplifies your design, which it often does. It’s not really backtracking, though—if you were able to think of the simpler approach sooner, you would have. So don’t feel bad. Simplicity is hard, and you’ll have to iterate your design to get there. The nature of breakthroughs, especially at the class and architectural level, is that you usually don’t see them until you’ve lived with your current design for a while.

Our organization (or customer) requires comprehensive design documentation. How can we satisfy this requirement if we don’t design everything up front?

Ask your customers to schedule documentation with a story, then estimate and deliver it as you would any other story. (See “As-Built Documentation” on p.XX.) Remind them that the design will change over time. The most effective option is to schedule documentation stories when the codebase is about to be retired or put in maintenance mode.

If your organization requires up-front documentation, the only way to provide it is to engage in up-front design. Try to keep your design efforts small and simple. If you can, use incremental design once you actually start coding.

Prerequisites

Allies
Simple Design
Reflective Design
Refactoring
Pair Programming
Mob Programming
Energized Work
Slack
Test-Driven Development
Team Room
Alignment

Incremental design depends on simple design and reflective design to keep code simple and steadily improve. It also requires a commitment to continuous daily improvement. This requires self-discipline and a desire for high-quality code, which not everybody has.

Luckily, you don’t need everyone to feel this way. In my experience, teams do well even if only one respected person on the team pushes for steady improvement. However, you do need pairing or mobbing, collective code ownership, energized work, and slack as support mechanisms. They help with self-discipline and allow people who are passionate about code quality to influence all parts of the code.

Test-driven development is also important. Its explicit refactoring step, repeated every few minutes, gives people continual opportunities to stop and make design improvements. Pairing and mobbing help in this area, too, by making sure that at least half the team’s programmers, as navigators, always have an opportunity to consider design improvements.

Be sure your team communicates well via a shared team room, either physical or virtual, if you’re using incremental design. Without constant communication about cross-module, cross-class, and architectural refactorings, your design will fragment and diverge. Agree on coding standards during your alignment discussion so that everyone follows the same patterns.

Anything that makes continuous improvement difficult will make incremental design difficult. Published interfaces are an example; because they are difficult to change after publication, incremental design usually isn’t appropriate for interfaces used by third parties, unless you have the ability change the third parties’ code when you change the interface. (But you can still use incremental design for the implementation of those interfaces.) Similarly, any language or platform that makes refactoring difficult will also inhibit your use of incremental design.

Finally, some organizations constrain teams’ ability to use incremental design, such as organizations that require up-front design documentation or who have rigidly controlled database schema. Incremental design may not be appropriate in these situations.

Indicators

When you use incremental design well:

  • Every week advances the software’s capabilities and design in equal measure.

  • You have no need to skip stories for a week or more to focus on refactoring or design.

  • Every week, the quality of the software is better than it was the week before.

  • As time goes on, the software becomes increasingly easy to maintain and extend.

Alternatives and Experiments

If you’re uncomfortable with the idea of incremental design, you can hedge your bets by combining it with up-front design. Start with an up-front design stage, then commit completely to incremental design. Although this will delay the start of your first story, and may require some up-front requirements work, this approach has the advantage of providing a safety net without incurring too much risk.

That’s not to say that incremental design doesn’t work—it does! But if you’re not comfortable with it, you can hedge your bets by starting with up-front design. That’s how I first learned to trust incremental design.

Other alternatives to incremental design are less successful. One common approach is to treat Agile as a series of mini-waterfalls, performing a bit of up-front design at the beginning of iteration, rather than relying on simple design and refactoring as incremental design does.

These design sessions are too short and small to create a cohesive design on their own. Code quality will steadily degrade. It’s better to embrace incremental design.

Another alternative is to use up-front design without also using incremental design. This only works well if your plans don’t change, which is the opposite of how Agile teams normally work.

Further Reading

“Is Design Dead?” [Fowler 2000b] discusses incremental design from a slightly skeptical perspective.

“Continuous Design” [Shore 2004a] discusses my experiences with difficult challenges in incremental design, such as internationalization and security.

“Evolutionary Design Animated” [Shore 2020a] discusses my real-world experience with incremental design by visualizing the changes in a small production system.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Simple Design

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Simple Design

Audience
Programmers

Our code is easy to modify and maintain.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

Antoine de Saint-Exupéry, author of “The Little Prince”

Any intelligent fool can make things bigger, more complex and more violent. It takes a touch of genius and a lot of courage to move in the opposite direction.

Albert Einstein

When writing code, Agile programmers often stop to ask themselves, “What’s the simplest thing that could possibly work?” They seem to be obsessed with simplicity. Rather than anticipaating changes and providing extensibility hooks and plug-in points, they create a simple design that anticipates as little as possible, as cleanly as possible. Counterintuitively, this results in designs that are ready for any change, anticipated or not.

This may seem silly. How can a design be ready for any change? Isn’t the job of a good designer or architect to anticipate future changes and make sure the design can be extended to support them? Isn’t the key to success to maximize reuse by anticipating future changes?

I’ll let Erich Gamma, coauthor of Design Patterns: Elements of Reusable Software, answer these questions. In the following excerpt, Gamma is interviewed by Bill Venners. [Venners 2005]

Venners: [Design Patterns] says, “The key to maximizing reuse lies in anticipating new requirements and changes to existing requirements, and in designing your systems so they can evolve accordingly. To design a system so that it’s robust to such changes, you must consider how the system might need to change over its lifetime. A design that doesn’t take change into account risks major redesign in the future.” That seems contradictory to the XP [Extreme Programming] philosophy.

Gamma: It contradicts absolutely with XP. It says you should think ahead. You should speculate. You should speculate about flexibility. Well yes, I matured too and XP reminds us that it is expensive to speculate about flexibility, so I probably wouldn’t write this exactly this way anymore. To add flexibility, you really have to justify it by a requirement. If you don’t have a requirement up front, then I wouldn’t put a hook for flexibility in my system up front.

But I don’t think XP and [design] patterns are conflicting. It’s how you use patterns. The XP guys have patterns in their toolbox, it’s just that they refactor to the patterns once they need the flexibility. Whereas we said in the book ten years ago [in 1995], no, you can also anticipate. You start your design and you use them there up-front. In your up-front design you use patterns, and the XP guys don’t do that.

Venners: So what do the XP guys do first, if they don’t use patterns? They just write the code?

Gamma: They write a test.

Venners: Yes, they code up the test. And then when they implement it, they just implement the code to make the test work. Then when they look back, they refactor, and maybe implement a pattern?

Gamma: Or when there’s a new requirement. I really like the flexibility that’s requirement driven. That’s also what we do in Eclipse. When it comes to exposing more API, we do that on demand. We expose API gradually. When clients tell us, “Oh, I had to use or duplicate all these internal classes. I really don’t want to do that,” when we see the need, then we say, OK, we’ll make the investment of publishing this as an API, make it a commitment. So I really think about it in smaller steps, we do not want to commit to an API before its time.

Allies
Reflective Design
Incremental Design

Traditional approaches to design focus on creating extensible designs. But, as Erich Gamma says, it’s expensive to speculate about flexibility. The Agile approach is to create simple designs, with no speculation. It combines with reflective design and incremental design to allow your design to evolve in any direction, regardless of how or when customers change their minds.

Simple doesn’t mean simplistic. Don’t make boneheaded design decisions in the name of reducing the number of classes and methods. A simple design is clean and elegant, not something you throw together with the least thought possible.

When, not if, I need to change this decision, how hard will it be?

Whenever I make a design decision, I always ask myself this question: “When, not if, I need to change this decision, how hard will it be?” The following techniques will help you keep your code simple and change costs low.

You Aren’t Gonna Need It

This pithy XP saying sums up an important aspect of simple design: avoid speculative coding. Whenever you’re tempted to add something to your design, ask yourself if it’s necessary for what the software does today. If not, well... you aren’t gonna need it. Your design could change. Your customers’ minds could change.

Similarly, remove code that’s no longer in use. You’ll make the design smaller, simpler, and easier to understand. If you need the code in the future, you can always get it out of version control. For now, it’s a maintenance burden.

Think of it this way: when it’s time to implement a new feature, would you rather deal with an existing design that’s wrong, or no existing design at all? When I raise this question with audiences, they overwhelmingly prefer the second option. It’s far easier to add code than to rip out and replace code that’s wrong.

When you speculate about the future, you make mistakes. You create code that has to be ripped out and replaced. All too often, those incorrect assumptions end up with their tendrils extending throughout the code, make the cost of removing them all the more difficult. It’s better not to speculate in the first place.

Once and Only Once

Once and only once is a surprisingly powerful design guideline. As Martin Folwer said: [Venners 2002]

One of the things I’ve been trying to do is look for simpler [rules] underpinning good or bad design. I think one of the most valuable rules is avoid duplication. “Once and only once” is the Extreme Programming phrase. The authors of The Pragmatic Programmer [Hunt and Thomas 1999] use “don’t repeat yourself,” or the DRY principle.

You can almost do this as an exercise. Look at some program and see if there’s some duplication. Then, without really thinking about what it is you’re trying to achieve, just pigheadedly try to remove that duplication. Time and time again, I’ve found that by simply removing duplication I accidentally stumble onto a really nice elegant pattern. It’s quite remarkable how often that is the case. I often find that a nice design can come from just being really anal about getting rid of duplicated code.

“Once and only once” isn’t just about removing duplicated code, though. It’s about giving every concept that’s important to your code a home. Think of it this way:

Express every concept Once. And only once.1

1Thanks to Andrew Black for this insight.

Every piece of knowledge must have a single, unambiguous, authoritative representation.

Or, as [Hunt and Thomas 1999] phrase their DRY Principle: “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”

An effective way to make your code express itself once (and only once) is to be explicit about core concepts. Rather than expressing those concepts with primitive data types—an approach called “Primitive Obsession”—create a new type.

For example, a company creating an online storefront represented money with floating point numbers. Instead, they could have created a Currency class. In JavaScript, it would look like this:

class Currency {
  constructor(amount)
    this._amount = amount;
  }

  toNumber() {
    return this._amount;
  }

  equals(currency) {
    return this._amount === currency._amount;
  }
}

At first, this seems wasteful. It’s just a simple wrapper over an underlying data type, except now with added overhead. Not only that, it seems like it increases complexity by adding another class.

But this sort of simple value type turns out to be immensely effective at enabling the “once and only once” principle. Now, any code that’s related to currency has an obvious place to live: inside the Currency class. If somebody needs to implement some new code, they look there first to see if it’s already implemented. And when something about the concept needs to change—say, foreign currency conversion—there’s one obvious place to implement that change.

The alternative isn’t pretty. That online storefront? It turned out that floating point math wasn’t a great choice. They got themselves into a situation where, when line item refunds and taxes were involved, they couldn’t generate a refund that matched the original invoice. (Whoops.) They had to engage in a multi-month process of finding everything that related to currency and changing it to use fixed-point math. True story.

Bet they wish they had expressed the Currency concept once. (And only once.) They could have changed the implementation of their Currency class and called it a day.

When, not if, you need to change a design decision, how hard will it be?

Coupling and Cohesion

Coupling and cohesion are ancient software design ideas that extend back to Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design. [Yourdon and Constantine 1975] (ch. 6-7) They’re no less powerful for their age. Both terms refer to the relationship between concepts in your code.2

2I’ve updated the definitions slightly. The original definition discusses “modules,” not “concepts.”

Two parts of the code are coupled when they a change to one necessitates a change to the other. To continue the Currency example, a function to convert currency to a string would be coupled to the data type used for currency.

Two concepts are cohesive when they’re physically close together in your source files. For example, if the function to convert currency to a string was in a Currency class, along with the underlying data type, they would be highly cohesive. If the function was in a utility module in a completely different directory, they would have low cohesion.

The best code has low coupling and high cohesion. In other words, changing the code for one concept doesn’t require you to change code for any other concept: changing the Currency data type doesn’t require changing the authentication code, or the refund logic. At the same time, when two parts of the code are coupled, they’re highly cohesive: if you change the Currency data type, everything that’s affected is in the same file, or at least in the same directory.

When you make a design decision, step back from design patterns and architectural principles and language paradigms for a moment. Ask yourself a simple question: When, not if, somebody changes this code, will the other things they need to change also be obvious? The answer comes down to coupling and cohesion.

Third-Party Components

Third-party components—libraries, frameworks, and services—are a common cause of problems. They tend to extend their tendrils throughout your code. When, not if, you need to replace or upgrade the component, the changes can be difficult and far-reaching.

To prevent this problem, isolate third-party components behind an interface you control. This is called an adapter or a wrapper. (In object-oriented languages, you can use the Adapter pattern. [Gamma et al. 1995]) Rather than using the component directly, your code uses the interface.

In addition to making your code resilient to third-party changes, your adapter can also present an interface customized for your needs, rather than imitating the third-party interface, and you can extend it with additional features as needed.

For example, when I wrote a wrapper for the Recurly billing service, I didn’t expose a method for Recurly’s subscriptions endpoint. Instead, I wrote isSubscribed(), which called the endpoint, parsed its XML, looped through the subscriptions, and converted the many possible subscription status flags into a simple boolean result.

Create your adapters incrementally. Instead of supporting every feature of the component you’re adapting, support only what you need today, focusing on providing an interface that matches the needs of your code. This will make the adapter cheaper to create and make it easier to change when (not if) you need to replace the underlying component in the future.

Some components—particularly frameworks—want to “own the world” and are difficult to hide behind adapters. For this reason, I prefer to build my code using simple libraries, with narrowly-defined interfaces, rather than one big do-everything framework. In some cases, though, a framework is the best choice. For frameworks that require you to extend base classes, you can create an adapter by building your own base classes that extends the frameworks’ classes, rather than extending the framework directly.

Alternatively, you can choose not to wrap the third-party components. This makes the most sense when the component is pervasive and stable, such as core language frameworks. You can make this decision on a case-by-case basis: for example, I’ll use the .NET String class directly, without an adapter, but I’ll use an adapter to isolate .NET’s cryptography libraries—not because I think they’ll change, but because they’re complicated, and an adapter will allow me to hide and centralize that complexity.

Fail Fast

One of the pitfalls of simple design is that your design will be incomplete. If you’re following the YAGNI principle (“You Aren’t Gonna Need It”), there will be some scenarios that your code just isn’t capable of handling. For example, you could write a currency rendering method that isn’t aware of non-US currencies yet, because your code currently renders everything in US dollars. But later, when you support more currencies, that gap could result in a bug.

To prevent these gaps from becoming a problem down the road, write your code to fail fast. Use assertions to signal the limits of your design. If someone tries to use something that isn’t implemented, the assertion will cause their tests to fail. For example, you could add an assertion to fail when the Currency class is asked to render a non-US currency.

Most language have some sort of run-time assertions built in. I like to write my own assertion module, though, and give it expressive functions (with good error messages) such as ensure.notNull(), ensure.unreachable(), ensure.impossibleException(), and so forth. I have them throw an exception when the assertion is violated.

Some people worry that failing fast will make their code more brittle, but the opposite is actually true. By failing fast, you make errors more obvious, which means you catch them before they go into production. As a safety net, though, to prevent your software from crashing outright, you can add a top-level exception handler that logs the error and recovers.

Fail fast code works best when combined with sociable tests (see “Write Sociable Tests” on p.XX), because sociable tests will trigger the fail fast checks, allowing you to find gaps more easily. Isolated tests require your tests to make assumptions about the behavior of dependencies, and it’s easy to assume a dependency will work when it actually fails fast.

Self-Documenting Code

Simplicity is in the eye of the beholder. It doesn’t matter much if you think the design is simple; if the rest of your team—or future maintainers of your software—find it too complicated, then it is.

To avoid this problem, use idioms and patterns that are common for your language and team. It’s okay to introduce new ideas, but run them past other team members first.

Names are one of your most powerful tools for self-documenting code. Be sure to use names that clearly reflect the intent of your variables, functions, classes, modules, and other entities. When a function has a lot of moving parts, use the Extract Function refactoring [Fowler 2018] to name each piece. When a conditional is hard to understand, use functions or intermediate variables to name each piece of the conditional.

Note that I didn’t say anything about comments. Comments aren’t bad, exactly, but they’re a crutch. Try to find ways to make your code so simple and expressive that comments aren’t needed.

Allies
Pair Programming
Mob Programming
Refactoring
Collective Code Ownership

Good names and simple code are hard. Three things make them easier: first, pairing or mobbing give you more perspectives and more ideas for good names. If you’re having trouble thinking of a good name, or you think some code your driver wrote is unclear, discuss the situation and try to find a better way to express it.

Second, you can always refactor. Give it your best shot now, and when you come back to it later, refactor it to be even better.

Third, take advantage of collective code ownership. When you see code that isn’t clear, figure out what the person who wrote it was trying to say, then refactor to make that intent obvious.

Limit Published Interfaces

Published interfaces reduce your ability to make changes. Once an interface is published to people outside your team, changing that interface typically requires a lot of expense and effort. You have to be careful not to break anything that they’re relying upon.

Some teams approach design as if every public method was also a published interface. This approach assumes that, once defined, a public method should never change. To be blunt, this is a bad idea: it prevents you from improving your design over time. A better approach is to change non-published interfaces whenever needed, updating callers accordingly.

If your code is used outside your team, then you do need published interfaces. Each one is a commitment to a design decision that you may wish to change in the future, so minimize the number of interfaces you publish. For each one, consider whether the benefit is worth the cost. Sometimes it will be, but it’s a decision to be made carefully. Postpone the decision as long as possible to allow your design to improve and settle.

In some organizations—including, famously, Google—teams have the ability to change other teams’ code. When they want to change an interface that‘s only published within the organization, they can easily do so.

In some cases, as with teams creating a library for third-party use, the entire purpose of the product is to provide a published interface. In that case, design your interface carefully, up front, rather than using evolutionary design. The smaller the interface, the better—it’s much easier to add to your interface than to remove mistakes.

Performance Optimization

Modern computers are complex. Even reading a single line of a file from a disk requires the coordination of the CPU, multiple levels of CPU cache, the operating system kernel, a virtual file system, a system bus, the hard drive controller, the hard drive cache, OS buffers, system buffers, and scheduling pipelines. Every component exists to solve a problem, and each has certain tricks to squeeze out performance. Is the data in a cache? Which cache? How’s your memory aligned? Are you reading asynchronously or are you blocking? There are so many variables, it’s nearly impossible to predict the performance of any single method.

The days in which a programmer could accurately predict performance by counting instruction cycles are long gone, yet some programmers still approach performance with this simplistic, brute-force mindset. They make random guesses about performance based on folklore and 20-line performance tests, flail around writing every micro-optimization they can think of, leave a twisty mess in the code, and take a long lunch.

Your intuition about performance is almost always going to be wrong.

In other words, your intuition about performance is almost always going to be wrong.

The only way to optimize modern systems is to take a holistic approach. You have to measure the real-world performance of the code, find the hot spots, and optimize from there. Don’t guess. Don’t make assumptions. Just profile the code.

String buffers, function calls, and boxing/unboxing—the most common bugaboos of the micro-optimizer—are almost always not the issue. Most of the time, your performance will be dominated by the network, database, or file system. If not, it’s likely to be an O(n²) or worse algorithm. Rarely, it will be thread contention or non-sequential memory access inside a tight loop. But the only way to be sure is to measure real-world performance. Don’t guess. Profile, profile, profile.

In the meantime, ignore the micro-optimization tricks. When (not if) you need to change your code—whatever the reason—it will be easier to do if it’s simple and straightforward.

Questions

What if we know we’re going to need a story? Shouldn’t we put in a design hook for it?

Your plan can change at any time. Unless the story is part of your current task plan, don’t put the hook in. The story could disappear from the plan, leaving you stuck with unnecessary complexity.

More importantly, evolutionary design actually reduces the cost of changes over time, so the longer you wait to make the change, the cheaper it will be.

What if ignoring a story will make it harder to implement in the future?

A simple design should make arbitrary changes possible by reducing duplication and reducing the impact of changes. If ignoring a potential story could make it more difficult, look for ways to eliminate that risk without explicitly coding support for the story. “Risk-Driven Refactoring” on p.XX has more details.

Prerequisites

Allies
Refactoring
Reflective Design
Incremental Design
Collective Code Ownership
Pair Programming
Mob Programming

Simple design requires continuous improvement through refactoring, reflective design, and incremental design. Without them, your design will fail to evolve with your requirements.

Don’t use simple design as an excuse for poor design. Simplicity requires careful thought. As the Einstein quote at the beginning of this practice reminds us, it’s a lot easier to create complex designs than simple designs. Don’t pretend “simple” means the code that’s fastest or easiest to implement.

Collective code ownership and pairing or mobbing, though not strictly necessary for simple design, will help your team devote the brainpower needed to create truly simple designs.

Indicators

When you create simple designs:

  • Your team doesn’t write code in anticipation of future stories.

  • Your team finishes work more quickly, because they don’t build things that aren’t needed today.

  • Your design supports arbitrary changes easily.

  • Although new features might require a lot of new code, changes to existing code are localized and straightforward.

Alternatives and Experiments

Most people still consider the advice Erich Gamma now disavows to be the best practice for design: “The key to maximizing reuse [and design quality] lies in anticipating new requirements and changes to existing requirements, and in designing your systems so they can evolve accordingly.”

Ally
Reflective Design

I call this “predictive design,” in contrast to reflective design, which I’ll discuss next. Although many teams have had success using predictive design, it relies on accurately anticipating new requirements and stories. If your expectations are too far off, you might have to rewrite a lot of code that was based on bad assumptions. Some of those rewrites might affect so much code that they can’t be done economically, resulting in long-term cruft in your codebase.

Generally, I’ve found the simple design techniques described in this practice to work better than predictive design, but you can combine the two. If you do use a predictive design approach, it’s best to hire programmers who have a lot of experience in your specific industry. They’re more likely to correctly anticipate changes.

Further Reading

Martin Fowler has a collection of his excellent IEEE Design columns online at http://www.martinfowler.com/articles.html#IDAOPDBC. Many of these columns discuss core concepts that help in creating a simple design.

The Pragmatic Programmer: From Journeyman to Master [Hunt and Thomas 1999] contains a wealth of design information that will help you create simple, flexible designs. Practices of an Agile Developer [Subramaniam and Hunt 2006] is its spiritual successor, offering similarly pithy advice, though with less emphasis on design and coding.

Prefactoring [Pugh 2005] also has good advice for creating simple, flexible designs.

“Fail Fast” [Shore 2004b] discusses that concept in more detail.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Chapter: Design (introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Design

Software typically becomes more expensive to change over time.

I’m not aware of any good studies on this,1 but I think it’s something every programmer has experienced. When starting a new codebase, we’re incredibly productive, but as time goes on, changes become more and more difficult. I’ve illustrated my experience with these costs in figure “Cost of Change Curves”.

1The most commonly quoted source is Barry Boehm, who has a chart showing exponential cost increases, but that chart is about the cost of fixing defects by phase, not the costs of change over time. And, as it turns out, that chart doesn’t accurately reflect the underlying data. Laurent Bossavit does an excellent job of chasing down the data in [Bossavit 2013] (chapter 10 and appendix B).

That’s a problem for Agile. If change becomes more significantly more expensive over time, the Agile model doesn’t make sense. Instead, the smart thing to do would be to make as many decisions as possible up-front, when they’re the least expensive. In fact, that’s exactly what pre-Agile methods tried to do.

In order for Agile to work, the cost of change must be relatively flat, or even decreasing over time. Kent Beck discussed this in the first Extreme Programming (XP) book:

[A flat cost of change curve] is one of the premises of XP. It is the technical premise of XP... If a flattened change cost curve makes XP possible, a steep change cost curve makes XP impossible. If change is ruinously expensive, you would be crazy to charge ahead without careful forethought. But if change stays cheap, the additional value and reduced risk of early concrete feedback outweighs the cost of early change. [Beck 2000] (pp. 23-24)

Extreme Programming Explained, 1st ed.

But—as we’ve all experienced—the cost of change isn’t flat. It does increase over time. Does that mean that Agile teams are doomed to collapse under the weight of unmaintainable code?

Without evolutionary design, Agile teams are doomed to collapse under the weight of unmaintainable code.

The brilliance of XP was that it included practices to proactively reduce the cost of change. The central technique was called “evolutionary design.” XP remains the only mainstream Agile method to include it. That’s a shame, because without evolutionary design, Agile teams do collapse under the weight of unmaintainable code.

I first heard about evolutionary design in 2000. It sounded ridiculous, but I respected the people recommending it, so I tried an experiment. My team was just about to start a new project. We started with a traditional up-front design, then applied evolutionary design from that point forward.

It worked. It worked incredibly well. Evolutionary design resulted in steady improvement, giving us a design that was cleaner, clearer, and easier to change than the up-front design we started with.

I’ve been pushing the boundaries of evolutionary design ever since. As with traditional design, I’m not aware of any good studies on this, but my experience is that it does decrease the cost of change over time.

I’ve illustrated my experience in figure “Cost of Change Curves”. Changes are more expensive at first, because you take time establishing tests and good design abstractions, but then the cost of change drops as the design becomes easier and easier to work with. In traditional design, progress is fast at first, but changes become more difficult as design errors accumulate. The curves seem to cross at about the 4-6 week mark—in other words, after about 4-6 weeks of development, the XP approach is cheaper than the traditional approach.

A graph with two axes. The x-axis is labelled “Number of changes,” and increases to the right. The y-axis is labelled “Cost of each change,” and increases upward. The graph has two lines. The first is labelled “Traditional software development,” and it starts with very low cost, but increases asymptotically. The second is is labelled “Extreme Programming,” and it starts with much higher cost, but then decreases asymptotically. The point where the two curves cross is labelled “4-6 weeks.”

Figure 1. Cost of change curves

The clearest example of this reduction come from a screencast I produced from 2012-2018.2 In that screencast, I livecoded a multi-user drawing application using evolutionary design and other XP practices. Each episode was about 15 minutes long and I produced over 600 episodes. The result is 150 hours of meticulously-documented evolutionary design.

2The screencast is available at https://www.letscodejavascript.com. The networking example can be found in episodes 473-498 of the Live channel.

You can see the impact of evolutionary design in the screencast’s implementation of live networking. First, I networked the user’s mouse pointer. That took 12 hours. Then line-drawing: 6½ hours. Then clearing the screen: 2¾ hours. Then two tricky polish features: ¾ hour and ½ hour. If you graph it out, the decreasing cost of change shows up, plain as day. See figure “Real-World Evolutionary Design”.

A graph with two axes. It’s similar to the “cost of change curves” figure, in that it has an x-axis labelled “cost of change” and a y-axis labelled “cost of each change.” The graph has a curve showing an asymptotic decline, and five data points, each representing a change. From left to right, they’re labelled “Pointer (12 hours),” “Lines (6.5 hours),” “Clear button (2.75 hours),” “Disappearing pointer (0.75 hours),” and “Touch pointer (0.5 hours).”

Figure 2. Real-world evolutionary design

Evolutionary design is essential to long-term success with Agile. It’s revolutionary. And barely anyone knows about it.

This chapter has three practices for evolutionary design:

  • “Incremental Design” on p.XX builds design simultaneously with customer value.

  • “Simple Design” on p.XX allows your team to create designs that are easy to modify and maintain.

  • “Reflective Design” on p.XX improves the design of existing code.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

Fireside Chat with Ron Quartel on FAST

Ron Quartel’s FAST Agile is an innovative technique for scaling Agile teams. I’ve been fascinated by it ever since I first saw Ron introduce it Agile Open Northwest several years ago.

Recently, I had the fortune of hosting a fireside chat with Ron at Agile PDX. The session starts with an overview from Ron at 7:10 and the chat begins at 42:06. The video is embedded below, or you can watch it on YouTube.

AoAD2 Chapter: Development (introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Development

It’s startling how rarely software development processes actually talk about the nuts and bolts of development. The way your team develops matters. It’s what your team spends the most hours doing.

This chapter includes practices to speed up your development and make it more reliable:

  • “Zero Friction” on p.XX removes the delays that slow down development.

  • “Continuous Integration” on p.XX keeps your latest code ready to release.

  • “Test-Driven Development” on p.XX ensures your software does exactly what programmers intend it to do.

  • “Refactoring” on p.XX allows programmers to continuously improve the design of their code.

  • “Spike Solutions” on p.XX enable programmers to learn through small, isolated experiments.

It introduces two key ideas:

  • “Key Idea: Optimize for Maintenance” on p.XX: Maintenance costs are more important than the costs of writing new code.

  • “Key Idea: Fast Feedback” on p.XX: The more quickly you can get feedback, the more quickly you can adjust course and correct mistakes.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Spike Solutions

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Spike Solutions

Audience
Programmers

We perform small, isolated experiments when we need more information.

You’ve probably noticed by now that Agile teams value concrete data over speculation. Whenever you’re faced with a question, don’t speculate about the answer—conduct an experiment! Figure out how you can use real data to make progress.

That’s what spike solutions are for, too. A spike solution, or spike, is a technical investigation. It’s a small experiment, in code, to research the answer to a problem. It usually takes less than a day. When you have the answer, the spike is discarded.

To truly understand a solution, write working code.

Spike solutions use code because nothing is more concrete. You can read as many books, tutorials, or online answers as you like, but to truly understand a solution, write working code. It’s important to work from a practical point of view, not just a theoretical one. The best way to do so depends on what you want to learn.

People often confuse spike solutions with “walking skeletons,” or “tracer bullets.” A walking skeleton is a bare-bones, but production-grade, codebase that includes every major component of your final system. Agile teams start new codebases by building a walking skeleton, then gradually add meat to the bones until the code is ready to release.

In contrast, a spike is narrowly focused on specific technical problem, and it’s thrown away afterwards. The term “spike solution” refers to solving a sharp, targeted test, not creating an “all the way through” solution.

Quick Questions

For questions about your language, libraries, or tools, write a line or two of code. If your programming language has a REPL (an interactive programming prompt), that’s often the quickest way to get your answer. For example, if you wanted to know if JavaScript could use comparison operators on strings, you could open a web browser console:

> "a" < "b"
true
> "b" < "a"
false
> "a" === "a"
true

Alternatively, you can write a short test. You can put it right next to your real tests, then delete it afterwards. For example, if you wanted to know if Java throws an exception on arithmetic overflow, a throwaway test would answer the question:

@Test
public void deleteme() {
  int a = Integer.MAX_VALUE + 1;  // test will fail if exception thrown
  System.out.println("No exception: a = " + a);
}

// Result of test run: "No exception: a = -2147483648"

Third-Party Dependencies

To learn how to use a third-party dependency, such as a library, framework, or service, create a small, standalone program that demonstrates how the dependency works. Don’t bother writing production-grade code—just focus on demonstrating the core idea. Run from the command line, hardcode values, and ignore user input. Provide just barely enough design and abstraction to keep yourself from getting lost.

For complex dependencies, such as frameworks, I’ll often start with their tutorial. However, those tutorials tend to emphasize getting up and running quickly, not helping you understand the framework. Then often have a lot of magic tooling that makes the framework harder to understand, not easier. So once you get the tutorial working, make it your own. Remove magic, call APIs manually, and simplify unneeded complexity. Think about your use cases and demonstrate them in the spike.

When you’re done, you can check the spike into your code repository to act as a reference while you build the real implementation. (I use a /spikes directory.) Once you’ve built out the production implementation, you can either delete the spike or keep it for future reference, depending on how useful it is.

Design Experiments

Ally
Reflective Design

If you have an idea for a design improvement, but you’re not sure how it will work out, you can spike the design. I’ll use this approach when I have an idea for a big design improvement, but I’m not sure if it will work as well as I think.

To spike a design, create a temporary, throwaway branch in your repository. In that temporary branch, you can experiment without having to worry about safe refactorings or passing tests. You don’t even need the code to work properly. The purpose of the spike is just to experiment with your design idea and see how it works in practice.

If your design idea doesn’t work out, delete the branch. If it does work out, you can keep it for reference, temporarily, but don’t merge it into your real code. Redo the change from scratch, this time taking care with your refactorings and updating tests as needed. When you’re done, delete the branch.

Allies
Incremental Design
Simple Design

Avoid overusing design spikes. Although you’re welcome to create a design spike whenever it will help you understand your design options, they shouldn’t be necessary for every story. You should also be able to use incremental design to start with a simple, obvious approach that incrementally becomes more sophisticated.

Making Time for Spikes

Small, “quick question” spikes are usually performed on the spur of the moment. You see a need to clarify a small technical issue, you write and delete a quick spike, you move on.

Allies
Stories
Task Planning
Slack

Dependency and design spikes can happen in several ways. Sometimes, they’re planned intentionally, either with a spike story or a task. At other times, you won’t realize a story needs a spike until you’re in the middle of working on it. When that happens, you can either add a task to your planning board, or just work on the spike as part of your current task. Either way, your slack absorbs the cost.

Questions

What’s the difference between a prototype and a spike?

“Prototype” doesn’t have a strict definition, but it usually refers to incomplete or non-functioning software that’s made to mimic the final product. They’re often used to demonstrate UIs or to learn by building a throw-away version of the application.

Spikes are much more focused. They’re created to answer a narrow technical question, not to mimic the final product.

Should we pair or mob on spikes?

It’s up to you. Because spikes don’t need to be maintained, even teams with strict pair programming rules don’t require writing spikes in pairs.

One very effective way to pair on a spike is to have one person research the technology while another codes. That’s typically how mob programmers approach spikes. Another option is for people to work independently on separate approaches, each doing their own research and coding, then coming together to review progress and share ideas.

Should we really throw away our spikes?

Unless you think someone will refer to it later, toss it. Remember, the purpose of a spike solution is to give you the information and experience needed to solve a problem, not to produce the code that solves it. The real production code usually ends up being a better reference than the spike.

When should we create a spike?

Whenever it helps. Perform a spike whenever the constraints of writing production-grade code get in the way of figuring out a solution.

What if the spike reveals that the problem is more difficult than we thought?

That’s good; now you have information you needed to know. Perhaps your on-site customers will reconsider the value of the story you’re working on, or perhaps you need to think of another way to accomplish your goal.

Prerequisites

Avoid the temptation to create useful or generic programs out of your spikes. Focus your work on answering a specific technical question, and stop working on the spike as soon as it answers that question. Similarly, there’s no need to create a spike when you already understand a technology well.

Ally
Test-Driven Development

Don’t use spikes as an excuse to avoid disciplined test-driven development and refactoring. Never copy spike code into production code. Even if the spike does exactly what you need, rewrite it using test-driven development so that it meets your production code standards.

Indicators

When you clarify technical questions with well-directed, isolated experiments:

  • Rather than speculating about how your program will work, you conduct an experiment that tells you.

  • The complexities of your production code doesn’t interfere with your experiments.

Alternatives and Experiments

Spike solutions are a learning technique based on performing small, concrete experiments. Some people perform these experiments in their production code, which increases the scope of possible error. If something doesn’t work as expected, is it because your understanding of the technology is wrong? Or is it due to an unseen interaction with the production code? Standalone spikes eliminate this uncertainty.

An alternative to spike solutions is to research problems by performing web searches, reading theory, and finding code snippets online. This can be good enough for small problems, but for bigger problems, the best way to really understand how the technology works is to get your hands dirty with some code. Go ahead and start with code you find online, if you need to, but then simplify and adapt the example. Why does it work? What happens when you change default parameters? Use the spike to clarify your understanding.

Another alternative, specifically for learning how to use third-party dependencies, is to start by writing test code that exercises the dependency. As you learn how the dependency works, refactor your experiment into a “test” and “implementation” portion, then move the implementation into your production code. This approach starts off like a spike, but morphs into high-quality, tested production code. Episode 5 of [Shore 2020b] demonstrates the technique, starting at 13:50, and episode 17 has a larger example.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Refactoring

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Refactoring

Audience
Programmers

We revise and improve the design of existing code.

Code rots. That’s what everybody says: entropy is inevitable, and chaos eventually turns your beautifully imagined, well-designed code into a big mess of spaghetti.

I used to think that, too, before I learned to refactor. Now I have a ten-year-old production codebase that’s better today than it was when I first created it. I’d hate to go back: every year, it’s so much better than it was the year before.

Refactoring isn’t rewriting.

Refactoring makes this possible. It’s the process of changing the design of your code without changing its behavior. What it does stays the same, but how it does it changes. Despite popular misuse of the term, refactoring isn’t rewriting. Nor is it any arbitrary change. Refactoring is a careful, step-by-step approach to incrementally improving the design of your code.

Refactorings are also reversible: there’s no one right answer, so sometimes you’ll refactor in one direction, and sometimes you’ll refactor in the other. Just as you can change the expression “x²-1” to “(x+1)(x-1)” and back, you can change the design of your code—and once you can do that, you can keep entropy at bay.

How to Refactor

Allies
Test-Driven Development
Reflective Design
Slack

Technically, you can refactor at any time, but unless your IDE has guaranteed-safe refactorings, it’s best to do it when you have a good suite of tests that are all passing. You’ll typically refactor during the “Refactor” step of the test-driven development loop, but you’ll also refactor to make a change easier or to clean up code.

When you refactor, you’ll proceed in a series of very small transformations. (Confusingly, each transformation is also called a refactoring.) Each refactoring is like making a turn on a Rubik’s cube. To achieve anything significant, you have to string together several individual refactorings, just as you have to string together several turns to solve the cube.

To refactor well, you need to work in a series of controlled steps.

The fact that refactoring is a sequence of small transformations is sometimes lost on people new to refactoring. You don’t just change the design of your code: to refactor well, you need to make that change in a series of controlled steps. Each step should only take a few moments, and your tests should pass after each one.

There are a wide variety of individual refactorings. The definitive guide is Martin Fowler’s eponymous book, Refactoring: Improving the Design of Existing Code. [Fowler 2018] It contains an in-depth catalog of refactorings, and is well worth studying. I learned more about good code and design from reading that book than from any other source.

That said, you don’t need to memorize all the individual refactorings. Instead, try to learn the mindset behind them. The automated refactorings in your IDE will help you get started, but there’s many more options available to you. The trick is to break down the change you want to make into small steps that only change the design of your code, not its behavior.

Refactoring in Action

To illustrate this point, I’ll continue the example started in “A TDD Example” on p.XX. This is a small example, for space reasons, but it still illustrates how a bigger change can be broken down into individual refactorings. Each refactoring is just a matter of seconds.

To follow along with this example, clone the git repository at https://github.com/jamesshore/livestream, check out the 2020-05-05-end tag, and modify the src/rot-13.js file. See README.md for instructions about how to run the build.)

At the end of the TDD example, we had a JavaScript module that performed ROT-13 encoding:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

function codeFor(letter) {
  return letter.charCodeAt(0);
}

The code worked, and was decent quality, but it was overly verbose. It used character codes for determining ranges, but JavaScript allows you to compare letters directly. We can simplify the code by removing codeFor() and having isBetween() do a direct comparison, like this:

function isBetween(letter, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

Although that change could be made all at once, making big changes in a real-world application will introduce bugs and can get you into a state that’s hard to get out of. (Been there, done that. In a public demonstration of refactoring. Youch.) As with TDD, the better you understand how to refactor, the smaller steps you’re able to make, and the faster you’ll go. So I’ll demonstrate the refactoring step by safe step.

To start with, isBetween() takes charCode, not letter. I needed to modify its caller, transformLetter(), to pass in a letter. But transformLetter() didn’t have a letter either. Even transform() didn’t have a letter. So that was the first thing to introduce:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) ...

This was a do-nothing statement: I introduced a variable, but nothing used it, so I expected the tests to pass. I ran them, and they did.

Although the letter variable wasn’t used, introducing it gave me the ability to pass letter into transformLetter. That was my next step.

Ally
Zero Friction

Notice how small these steps were. From experience, I knew that manually refactoring function signatures often goes wrong, so I wanted to take it slow. Such small steps require a zero-friction build.

exports.transform = function(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

The tests passed again. Now that I had letter in transformLetter(), I could pass it through to isBetween():

function transformLetter(letter, charCode) {
  if (isBetween(letter, charCode, "a", "m") ||
      isBetween(letter, charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(letter, charCode, "n", "z") ||
             isBetween(letter, charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

(Tests passed.) And now that isBetween() had letter, I could finally modify isBetween to use it:

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

(Tests passed.) The codeFor() method was no longer in use, so I deleted it.

Ally
Slack

(Tests passed.) I had accomplished what I originally set out to do, but now that I saw what the code looked like, I could see more opportunities to simplify. This is common when refactoring: cleaning up the code will make more cleanups visible. Deciding whether to pursue those additional cleanups is a question of judgment and how much slack you have.

This is what the code looked like:

exports.transform = function(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  if (isBetween(letter, charCode, "a", "m") ||
      isBetween(letter, charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(letter, charCode, "n", "z") ||
             isBetween(letter, charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

In this case, I had plenty of slack, so I decided to keep refactoring. The isBetween() function didn’t seem like it was adding any value, so I inlined it. I was able to do this in a single, bigger step because I used my editor’s automatic “Inline Function” refactoring.

function transformLetter(letter, charCode) {
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M")  {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) Passing in charCode seemed redundant, so I copied the charCode logic from transform into transformLetter():

function transformLetter(letter, charCode) {
  charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) And then removed the unneeded charCode parameter:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) That was a nice simplification, but I saw an opportunity to make it even better. Rather than manually looping over the string, I realized I could use a regular expression to call transformLetter() instead:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  return input.replace(/[A-Za-z]/g, transformLetter);
};

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) I thought that was as good as it could get, at first. But the /[A-Za-z]/ in the regex bothered me. I had included it to make the code more readable, but matching every character with /./ would have worked just as well. The regex wasn’t really doing anything.

Then it hit me: with the regex ensuring that only letters were being passed to transformLetter(), I could simplify the if statements. I wasn’t 100% sure about this, so I started slow:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

The tests failed! I had forgotten that, in ASCII, upper-case “Z” comes before lower-case “a”. I needed to normalize the letter first:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter <= "m" || letter >= "A" && letter.toUpperCase() <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

That fixed it. Now I felt safe removing the second half of the if statement:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) The code was good, but the mutable charCode variable was bothering me. I prefer a more functional style. Rather than modifying the charCode variable directly, I decided to try storing the rotation amount instead.

First I introduced the new variable:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
    rotation = 13;
  } else {
    charCode -= 13;
    rotation = -13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) Then used it in place of charCode:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
    rotation = 13;
  } else {
    charCode -= 13;
    rotation = -13;
  }
  return String.fromCharCode(charCode + rotation);
}

(Test passed.) And inlined charCode using my editor’s automated refactoring:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    rotation = 13;
  } else {
    rotation = -13;
  }
  return String.fromCharCode(letter.charCodeAt(0) + rotation);
}

(Test passed.) Finally, I converted the if statement to a constant expression. In my editor, this was two automated refactorings: an automated conversion of if to ?:, and an automated joining of declaration and assignment. Then I manually changed let to const. The tests passed after each step, and the completed code looked like this:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  return input.replace(/[A-Za-z]/g, transformLetter);
};

function transformLetter(letter) {
  const rotation = letter.toUpperCase() <= "M" ? 13 : -13;
  return String.fromCharCode(letter.charCodeAt(0) + rotation);
}

This is a nice improvement over the original code. I could have made it more compact, but that would have sacrificed readability, so I was happy with it as it was. Some people might argue that the ternary expression was a step too far already.

And that’s what it looks like to refactor, step by safe step. Although this is a small example, it’s an accurate reflection of real-world refactoring. In larger codebases, gradual changes like this are the basis for big improvements.

Small steps are important, too. This example is simple enough that you could convert it all in one or two big steps, but if you learn how to take small steps on small problems like this, you’ll be able to do so on large problems, too, and that’s how you successfully refactor a large codebase.

To see an example of incremental refactoring applied to a larger problem, see Emily Bache’s superb walkthrough of the Gilded Rose kata. [Bache 2018]

Allies
Incremental Design
Reflective Design

Breaking a big design change into a sequence of small refactorings enables you to make dramatic design changes without risk. You can even make big changes incrementally, fixing part of the design one day and another part of it another day. This is a necessary part of using your slack to make big changes, and the key to successful Agile design.

Questions

How often should we refactor?

Allies
Test-Driven Development
Slack

Constantly. Perform little refactorings as you use TDD and bigger refactorings as part of your slack. Every week, your design should be better than it was the week before.

Isn’t refactoring rework? Shouldn’t we design our code correctly from the beginning?

If it were possible to design your code perfectly from the beginning, then refactoring would be rework. However, as everybody who’s worked with large systems knows, mistakes always creep in. Even if they didn’t, the needs of your software change over time, and your design has to be updated to match. Refactoring gives you the ability to constantly improve.

What about our database? That’s what really needs improvement.

You can refactor databases, too. Just as with normal refactorings, the trick is to proceed in small, behavior-preserving steps. Refactoring Databases: Evolutionary Database Design [Ambler and Sadalage 2006] describes how. However, data migration can take a long time, which requires special deployment considerations, as described in “Continuous Deployment” on p.XX.1

1XXX replace with direct reference when Continuous Deployment is written.

How can we make large design changes without conflicting with other team members?

Ally
Continuous Integration

Communicate regularly and use continuous integration. Before taking on a refactoring that will touch a bunch of code, integrate your existing code and let people know what you’re about to do. Sometimes other people can reduce the chance of integration conflicts by mirroring any big rename refactorings you’re planning on doing.

I can’t refactor without breaking a lot of tests! What am I doing wrong?

Your tests should check the behavior of your code, not the implementation, and refactoring should change implementation, but not behavior. So if you’re doing everything correctly, the tests shouldn’t break when you refactor.

Some refactorings will change function or method signatures, but that only changes the interface, not the underlying behavior. Refactoring an interface requires changing all callers, which includes your tests, but your tests shouldn’t require any special changes.

If your tests often break when you refactor, or if your tests make interface changes difficult, it could be due to inappropriate use of test doubles (such as mock objects). Look at ways to improve your test design. One option is to use sociable tests instead of isolated tests, as “Write Sociable Tests” on p.XX discusses. If that doesn’t help, ask a mentor for guidance.

Prerequisites

Allies
Test-Driven Development
Zero Friction
Collective Code Ownership
Continuous Integration

Refactoring requires good tests and a zero-friction build. Without tests, refactoring is risky, because you can’t easily tell whether your changes have accidentally broken something. (Some IDEs provide a few guaranteed-safe refactorings, but other refactorings still require tests.) Without a zero-friction build, feedback is too slow to allow small steps. It’s still technically possible to refactor, but it’s slow and painful.

Refactoring also requires collective code ownership. Any significant design changes will require that you touch many parts of the code. Collective code ownership gives you the permission you need to do so. Similarly, refactoring requires continuous integration. Without it, each integration will be a nightmare of conflicting changes.

Published interfaces—interfaces used by code that’s outside of your team’s control—require careful management. You’ll need to coordinate with everyone who uses the published interface. For this reason, it’s often best to avoid refactoring published interfaces.

Some programming environments, particularly “low code” or “no code” environments, can make refactoring difficult. So can highly-dynamic programming styles, such as monkey-patching (code that redefines existing interfaces) or string-based reflection. Refactoring may not be worth the cost in these situations—but be sure to consider the increased costs of change and decreased longevity that comes with choosing not to refactor.

It’s possible, although not common, to spend too much time refactoring. You don’t need to refactor code that’s unrelated to your current work. Similarly, balance your need to finish stories with the need to have good code. As long as the code is better than it was when you started, you’re doing enough. In particular, if you think the code could be better, but you’re not sure how to improve it, it’s okay to leave it for someone else to improve later. That’s one of the great things about collective ownership: someone will improve it later.

Indicators

When you use refactoring as an everyday part of your toolkit:

  • The code constantly improves.

  • You make significant design changes safely and confidently.

  • Every week, the code is at least slightly better than it was the week before.

Alternatives and Experiments

There are no real alternatives to refactoring. No matter how carefully you design your code, it will eventually get out of sync with the needs of your application. Without refactoring, that disconnect will overwhelm you, leaving you to choose between rewriting the software, at great expense and risk, or abandoning it entirely.

However, there are always opportunities to learn how to refactor better. That typically involves figuring out how to take smaller, safer, more reliable steps. Keep practicing. I’ve been at it for twenty years and I’m still learning new tricks.

Further Reading

Refactoring: Improving the Design of Existing Code [Fowler 2018] is the definitive reference for refactoring. It’s also a great read. Buy it.

Refactoring to Patterns [Kerievsky 2004b] takes Fowler’s work one step further, showing how refactorings can string together to achieve significant design changes. It’s a good way to learn more about how to use individual refactorings to achieve big results.

Refactoring Databases: Evolutionary Database Design [Ambler and Sadalage 2006] shows how refactoring can apply to database schemas.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Test-Driven Development

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Test-Driven Development

Audience
Programmers

We produce high-quality code in small, verifiable steps.

“What programming languages really need is a ‘DWIM’ instruction,” the joke goes. “Do what I mean, not what I say.”

Programming is demanding. It requires perfection, consistently, for months and years of effort. At best, mistakes lead to code that won’t compile. At worst, they lead to bugs that lie in wait and pounce at the moment that does the most damage.

Wouldn’t it be wonderful if there were a tool that alerted you to programming mistakes moments after you made them—a tool so powerful, it virtually eliminated the need for debugging?

There is such a tool, or rather, a technique. It’s test-driven development, and it really works.

Test-driven development, or TDD, is a rapid cycle of testing, coding, and refactoring. When adding a feature, a programmer may perform dozens of these cycles, implementing and refining the software in tiny steps until there is nothing left to add and nothing left to take away. Done well, TDD ensures that the code does what you mean, not just what you say.

When used properly, TDD also helps you improve your design, documents your code for future programmers, enables refactoring, and guards against future mistakes. Better yet, it’s fun. You’re always in control and you get this constant reinforcement that you’re on the right track.

TDD isn’t perfect, of course. TDD helps programmers code what they intended to code, but it doesn’t stop programmers from misunderstanding what they need to do. It helps improve documentation, refactoring, and design, but only if programmers work hard to do so. It also has a learning curve: it’s difficult to add to legacy codebases and takes extra effort to apply to code that involves the outside world, such as user interfaces, networking, and databases.

Try it anyway. Although TDD benefits from other Agile practices, it doesn’t require them. You can use it with almost any code.

Why TDD Works

Back in the days of punch cards, programmers laboriously hand-checked their code to make sure it would compile. A compile error could lead to failed batch jobs and intense debugging sessions to look for the misplaced character.

Getting code to compile isn’t such a big deal anymore. Most IDEs check your syntax as you type, and some even compile every time you save. The feedback loop is so fast that errors are easy to find and fix. (See “Key Idea: Fast Feedback” on p.XX.) If something doesn’t compile, there isn’t much code to check.

Test-driven development applies the same principle to programmers’ intention. Just as modern environments provide feedback on the syntax of your code, TDD cranks up the feedback on the semantics of your code. Every few minutes—as often as every 20 or 30 seconds—TDD verifies that the code does what you think it should do. If something goes wrong, there are only a few lines of code to check. Mistakes become obvious.

TDD is a series of validated hypotheses.

TDD accomplishes this trick through a series of validated hypothesis. You work in very small steps, and at every step, you make a mental prediction about what’s going to happen next. First you write a bit of test code and predict it will fail in a particular way. Then a bit of production code and predict the test will now pass. Then a small refactoring and predict the tests will pass again. If a prediction is ever wrong, you stop and figure it out—or just back up and try again.

As you go, the tests and production code mesh together to check each other’s correctness, and your successful predictions confirm that you’re in control of your work. The result is code that does exactly what you thought it should. You can still forget something, or misunderstand what needs to be done. But you can have confidence that the code does what you intended.

When you’re done, the tests remain. They’re committed with the rest of the code, and they act as living documentation of how you intended the code to behave. More importantly, your team runs the tests with (nearly) every build, providing safety for refactoring and ensuring that the code continues to work as originally intended. If someone accidentally changes the code’s behavior—for example, with a misguided refactoring—the tests fail, signaling the mistake.

How to Use TDD

You’ll need a programmer’s testing framework to use TDD. For historical reasons, they’re called “unit testing frameworks,” although they’re useful for all sorts of tests. Every popular language has one, or even multiple—just do a web search for “<language> unit test framework.” Popular examples include JUnit for Java, xUnit.net for .NET, Mocha for JavaScript, and CppUTest for C++.

TDD doesn’t prevent mistakes; it reveals them.

TDD follows the “red, green, refactor” cycle illustrated in figure “The TDD Cycle”. Other than time spent thinking, each step should be incredibly small, providing you with feedback within a minute or two. Counterintuitively, the better at TDD someone is, the better they are at taking smaller steps, and the faster they go. This is because TDD doesn’t prevent mistakes; it reveals them. Small steps means fast feedback, and fast feedback means mistakes are easier and faster to fix.

A chart showing four steps: “Think,” followed by “Red bar,” followed by “Green bar,” followed by “Refactor.” There’s a loop from “Refactor” back to “Green bar,” and another loop from “Refactor” back to “Think.”

Figure 1. The TDD cycle

Step 1: Think

TDD is “test-driven” because you start with a test, and then only write enough code to make the test pass. The saying is, “Don’t write any production code unless you have a failing test.”

Your first step, therefore, is to engage in a rather odd thought process. Imagine what behavior you want your code to have, then think of the very first piece of that to implement. It should be small. Very small. Less than five lines of code small.

Next, think of a test—also just a few lines of code—that will fail until exactly that behavior is present. Think of something that tests the code’s behavior, not its implementation. As long as the interface doesn’t change, you should be able to change the implementation at any time, without having to change the test.

Allies
Pair Programming
Mob Programming
Spike Solutions

This is the hardest part of TDD, because it requires thinking two steps ahead: first, what you want to do; second, what test will require you to do it. Pairing and mobbing help. While the driver works on making the current test pass, the navigator thinks ahead, figuring out which increment and test should come next.

Sometimes, thinking two steps ahead will be too difficult. When that happens, use a spike solution to figure out how to approach the problem, then rebuild it using TDD.

Step 2: Red bar

Once you know your next step, write the test. Write just enough test code for the current increment of behavior—hopefully fewer than five lines of code. If it takes more, that’s okay; just try for a smaller increment next time.

Write the test in terms of the code’s public interface, not how you plan to implement its internals. Respect encapsulation. The first time you test a class, module, method, or function, that means your test will use names that don’t exist yet. This is intentional: it forces you to design your interface from the perspective of a user of that interface, not as its implementer.

After the test is coded, predict what will happen. Typically, the test should fail, resulting in a red progress bar in most test runners. Don’t just predict that it will fail, though; predict how it will fail. Remember, TDD is a series of validated hypothesis, and this is your first hypothesis.

Ally
Zero Friction

Then use your watch script or IDE to run the tests. You should get feedback within a few seconds. Compare the result to your prediction. Did they match?

If the test doesn’t fail, or if it fails in a different way than you expected, you’re no longer in control of your code. Perhaps your test is broken, or it doesn’t test what you thought it did. Troubleshoot the problem. You should always be able to predict what’s going to happen.

Your goal is to always know what the code is doing and why.

It’s just as important to troubleshoot unexpected successes as it is to troubleshoot unexpected failures. Your goal isn’t merely to have tests that pass; it’s to remain in control of your code—to always know what the code is doing and why.

Step 3: Green bar

Next, write just enough production code to get the test to pass. Again, you should usually need less than five lines of code. Don’t worry about design purity or conceptual elegance; just do what you need to do to make the test pass. This is okay because you’ll be refactoring in a moment.

Make another prediction and run the tests. This is your second hypothesis.

The tests should pass, resulting in a green progress bar. If the test fails, get back to known-good code as quickly as you can. Often, the mistake will be obvious. After all, you’ve only written a few new lines.

If the mistake isn’t obvious, consider undoing your change and trying again. Sometimes it’s best to delete or comment out the new test and start over with a smaller increment. Remaining in control is key.

It’s always tempting to beat your head against the problem rather than backing up and trying again. I do it too. And yet, hard-won experience has taught me that trying again with a smaller increment is almost always faster and easier.

That doesn’t stop me from beating my head against walls—it always feels like the solution is just around the corner—but I have finally learned to set a timer so the damage is contained. If you can’t bring yourself to undo right away, set a five or ten-minute timer, and promise yourself that you’ll back up and try again, with a smaller increment, when the timer goes off.

Step 4: Refactor
Ally
Refactoring

Once your tests are passing again, you can now refactor without worrying about breaking anything. Review the code you have so far and look for possible improvements. If you’re pairing or mobbing, ask your navigator if they have any suggestions.

Incrementally refactor to make each improvement. Use very small refactorings—less than a minute or two each, certainly not longer than five minutes—and run the tests after each one. They should always pass. As before, if the test doesn’t pass and the mistake isn’t immediately obvious, undo the refactoring and get back to known-good code.

Ally
Simple Design

Refactor as much as you like. Make the code you’re touching as clean as you know how, without worrying about making it perfect. Be sure to keep the design focused on the software’s current needs, not what might happen in the future.

While you refactor, don’t add any functionality. Refactoring isn’t supposed to change behavior. New behavior requires a failing test.

Step 5: Repeat

When you’re ready to add new behavior, start the cycle over again.

If things are going smoothly, with every hypothesis matching reality, you can “upshift” and take bigger steps. (But generally not more than five lines of code at a time.) If you’re running into problems, “downshift” and take smaller steps.

The key to TDD is small increments and fast feedback.

The key to success with TDD is small increments and fast feedback. Every minute or two, you should get a confirmation that you’re on the right track and your changes did what you expected them to do. Typically, you’ll run through several cycles very quickly, then spend more time thinking and refactoring for a few cycles, then speed up again.

Eat the Onion From the Inside Out

The hardest part about TDD is figuring out how to take small steps. Luckily, coding problems are like ogres, and onions: they have layers. The trick with TDD is to start with the sweet, juicy core, and then work your way out from there. You can use any strategy you like, but this is the approach I typically use:

  1. Core interface. Start by defining the core interface that you want to call, and write a test that calls that interface in the simplest possible way. Use this as an opportunity to see how the interface works in practice. Is it comfortable? Does it make sense? To make the test pass, you can just hard-code the answer.

  2. Calculations and branches. Your hard-coded answer isn’t enough. What calculations and logic are at the core of your new code? Start adding them, one branch and calculation at a time. Focus on the happy path: how the code will be used when everything’s working properly.

  3. Loops and generalization. Your code will often involve loops or alternative ways of being used. Once the core logic has been implemented, add support for those alternatives, one at a time. You’ll often need to refactor the logic you’ve built into a more generic form to keep the code clean.

  4. Special cases and error handling. After you’ve handled all the happy-path cases, think about everything that can go wrong. Do you call any code that could throw an exception? Do you make any assumptions that need to be validated? Write tests for each one.

  5. Runtime assertions. As you work, you might identify situations that can only arise as the result of a programming error, such as an array index that’s out of bounds, or a variable that should never be null. Add run-time assertions for these cases so they fail fast. (See “Fail Fast” on p.XX.) They don’t need to be tested, since they’re just an added safety net.

James Grenning uses the mnemonic “ZOMBIES” to express the same idea: Test Zero, then One, then Many. While you test, pay attention to Boundaries, Interfaces, and Exceptions, all while keeping the code Simple. [Grenning 2016]

A TDD Example

TDD is best understood by watching somebody do it. I have several video series online demonstrating real-world TDD. At the time of this writing, my free “TDD Lunch & Learn” series is the most recent. It has 21 episodes covering everything from TDD basics all the way up to thorny problems such as networking and timeouts. [Shore 2020b]

I’ll describe the first of these examples here. It uses TDD to create a ROT-13 encoding function. (ROT-13 is a simple Caesar cipher where “abc” becomes “nop”, and vice versa.) It’s a very simple problem, but it’s a good example of how even small problems can be broken down into very small steps.

In this example, notice the techniques I use to work in small increments. They may even seem ridiculously small, but this makes finding mistakes easy, and that helps me go faster. As I’ve said, the more experience you have with TDD, the smaller the steps you’re able to take, and the faster that allows you to go.

Start with the core interface

Think. First, I needed to decide how to start. As usual, the core interface is a good starting point. What did I want it look like?

This example was written in JavaScript—specifically, Node.js—so I had the choice between creating a class or just exporting a function from a module. There didn’t seem to be much value in making a full-blown class, so I decided to just make a rot13 module that exported a transform function.

Red bar. Now that I knew what I wanted to do, I was able to write a test that exercised that interface in the simplest possible way:

it("runs tests", function() {            ①
  assert.equal(rot13.transform(""), ""); ②
});

Line 1 defines the test, and line 2 asserts that the actual value, rot13.transform(""), matches the expected value, "". (Some assertion libraries put the expected value first, but this example uses Chai, which puts the actual value first.)

Before running the test, I made a hypothesis. Specifically, I predicted that the test would fail because rot13 didn’t exist, and that’s what happened.

Green bar. To make the test pass, I created the interface and hardcoded just enough to satisfy the test.

export function transform() {
  return "";
}

Hardcoding the return value is kind of a party trick, and I’ll often write a bit of real code during this first step, but in this case, there wasn’t anything else the code needed to do.

Good test names give you an overview of how the code is intended to work.

Refactor. Check for opportunities to refactor every time through the loop. In this case, I renamed the test from “runs tests,” which was leftover from my initial setup, to “does nothing when input is empty.” That’s obviously more helpful for future readers. Good tests document how the code is intended to work, and good test names allow the reader to get a high-level understanding by skimming through the names. Note how the name talks about what the production code does, not what the test does.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});
Calculations and branches

Think. Now I needed to code the core logic of the ROT-13 transform. Eventually, I knew I wanted to loop through the string and convert one character at a time, but that was too big of a step. I needed to think of something smaller.

A smaller step is obviously to “convert one character,” but even that was too big. Remember, the smaller the steps, the faster you’re able to go. I needed to break it down smaller. Ultimately, I decided to just transform one lowercase letter forward thirteen letters. Uppercase letters and looping around after “z” would wait for later.

Red bar. With such a small step, the test was easy to write:

it("transforms lower-case letters", function() {
  assert.equals(rot13.transform("a"), "n");
});

My hypothesis was that the test would fail, expecting "n" but getting "", and that’s what happened.

Green bar. Making the test pass was just as easy:

export function transform(input) {
  if (input === "") return "";

  const charCode = input.charCodeAt(0);
  charCode += 13;
  return String.fromCharCode(charCode);
}

Even though this was a small step, it forced me to work out the critical question of converting letters to character codes and back, something that I had to look up. Taking a small step allowed me to solve this problem in isolation, which made it easier to tell when I got it right.

Refactor. I didn’t see any opportunities to refactor, so it was time to go around the loop again.

Repeat. I continued in this way, step by small step, until the core letter transformation algorithm was complete.

  1. Lower-case letter forward: an (as I just showed)

  2. Lower-case letter backward: na

  3. First character before a doesn’t rotate: ``

  4. First character after z doesn’t rotate: {{

  5. Upper-case letters forward: AN

  6. Upper-case letters backward: NA

  7. More boundary cases: @@ and [[

After each step, I considered the code and refactored when appropriate. Here are the resulting tests. The numbers correspond to each step. Note how some steps resulted in new tests, and others just enhanced an existing test.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});

it("transforms lower-case letters", function() {
  assert.equal(rot13.transform("a"), "n"); ①
  assert.equal(rot13.transform("n"), "a"); ②
});

it("transforms upper-case letters", function() {
  assert.equal(rot13.transform("A"), "N");  ⑤
  assert.equal(rot13.transform("N"), "A");  ⑥
});

it("doesn't transform symbols", function() {
  assert.equal(rot13.transform("`"), "`"); ③
  assert.equal(rot13.transform("{"), "{"); ④
  assert.equal(rot13.transform("@"), "@");  ⑦
  assert.equal(rot13.transform("["), "[");  ⑦
});

Here’s the production code. It’s harder to match each step to the code because there was so much refactoring (see the video for details), but you can see how TDD is an iterative process that gradually causes the code to grow:

export function transform() {
  if (input === "") return "";

  let charCode = input.charCodeAt(0);                                    ①
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {    ③④⑤
    charCode += 13;                                                      ①
  }
  if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {   ② ④ ⑥
    charCode -= 13;                                                       ②
  }
  return String.fromCharCode(charCode);                                  ①
}

function isBetween(charCode, firstLetter, lastLetter) {                      ④
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);④
}                                                                            ④

function codeFor(letter) {                                                 ③
  return letter.charCodeAt(0);                                             ③
}                                                                          ③

Step 7 (tests for more boundary cases) didn’t result in new production code, but I included it just to make sure I hadn’t made any mistakes.

Loops and generalization

Think. So far, the code only handled strings with one letter. Now it was time to generalize it to support full strings.

Refactor. I realized that this would be easier if I factored out the core logic, so I jumped back to the “Refactoring” step to do so.

export function transform(input) {
  if (input === "") return "";

  let charCode = input.charCodeAt(0);
  return transformLetter(charCode);
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  }
  if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween...
function codeFor...

Refactoring to make the next step easier is a technique I use all the time. Sometimes, during the “Red bar” step, I realize that I should have refactored first. When that happens, I comment out the test temporarily so I can refactor while my tests are passing. This makes it faster and easier for me to detect refactoring errors.

Red bar. Now I was ready to generalize the code. I updated one of my tests to prove a loop was needed:

it("transforms lower-case letters", function() {
  assert.equal(rot13.transform("abc"), "nop");
  assert.equal(rot13.transform("n"), "a");
});

I expected it to fail, expecting "nop" and getting "n", because it was only looking at the first letter, and that’s exactly what happened.

Green bar. I modified the production code to add the loop:

export function transform(input) {
  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter...
function isBetween...
function codeFor...
Ally
Zero Friction

Refactor. I decided to flesh out the tests so they’d work better as documentation for future readers of this code. This wasn’t strictly necessary, but I thought it would make the ROT-13 logic more obvious. I changed one assertion at a time, of course. The feedback was so fast and frictionless, executing automatically every time I saved, there was no reason not to.

In this case, everything worked as expected, but if something had failed, changing one assertion at a time would have made debugging it just a little bit easier. Those benefits add up.

it("does nothing when input is empty", function() {
  assert.equal(rot13.transform(""), "");
});

it("transforms lower-case letters", function() {
  assert.equal(
    rot13.transform("abcdefghijklmnopqrstuvwxyz"), "nopqrstuvwxyzabcdefghijklm" ①
  );
  assert.equal(rot13.transform("n"), "a");                                      ②
});

it("transforms upper-case letters", function() {
  assert.equal(
    rot13.transform("ABCDEFGHIJKLMNOPQRSTUVWXYZ"), "NOPQRSTUVWXYZABCDEFGHIJKLM" ③
  );
  assert.equal(rot13.transform("N"), "A");                                      ④
});

it("doesn't transform symbols", function() {
  assert.equal(rot13.transform("`{@["), "`{@[");                                ⑤
  assert.equal(rot13.transform("{"), "{");assert.equal(rot13.transform("@"), "@");assert.equal(rot13.transform("["), "[");                                       ⑥
});
Special cases, error handling, and runtime assertions

Finally, I wanted to look at everything that could go wrong. I started with the runtime assertions. How could the code be used incorrectly? Usually, I don’t test my runtime assertions, but for the purpose of demonstration, I did so this time:

it("fails fast when no parameter provided", function() {         ①
  assert.throws(                                                 ①
    () => rot13.transform(),                                     ①
    "Expected string parameter"                                  ①
  );                                                             ①
});                                                              ①

it("fails fast when wrong parameter type provided", function() { ②
  assert.throws(                                                 ②
    () => rot13.transform(123),                                  ②
    "Expected string parameter"                                  ②
  );                                                             ②
});                                                              ②

Of course, I followed the TDD loop and added the tests one at a time. Implementing them meant adding a guard clause, which I also implemented incrementally:

export function transform(input) {
  if (input === undefined ①  || typeof input !== "string" ②  ) {
    throw new Error("Expected string parameter");                 ①
  }                                                               ①
  ...

Good tests also act as documentation, so my last step is always to review the tests and think about how well they communicate to future readers. Typically, I’ll start with the general, “happy path” case, then go into specifics and special cases. Sometimes I’ll add a few tests just to clarify behavior, even if I don’t have to change the production code. That was the case with this code. These are the tests I ended up with:

it("does nothing when input is empty", ...);
it("transforms lower-case letters", ...);
it("transforms upper-case letters", ...);
it("doesn’t transform symbols", ...);
it("doesn’t transform numbers", ...);
it("doesn’t transform non-English letters", ...);
it("doesn’t break when given emojis", ...);
it("fails fast when no parameter provided", ...);
it("fails fast when wrong parameter type provided", ...);

And the final production code:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

function codeFor(letter) {
  return letter.charCodeAt(letter);
}

At this point, the code did everything it needed to. Readers familiar with JavaScript, however, will notice that the code can be further refactored and improved. I continue the example in “Refactoring in Action” on p.XX.

Fast and Reliable Tests

Teams that embrace TDD accumulate thousands of tests. The more tests you have, the more important speed and reliability become.

Your tests must be fast, and they must produce the same answer every time.

In TDD, you run the tests as often as one or two times every minute. They must be fast, and they must produce the same answer every time. If they don’t, you won’t be able to get feedback within 1-5 seconds, and that’s crucial for the TDD loop to work effectively. You’ll stop running the tests as frequently, which means you won’t catch errors as quickly, which will slow you down.

Ally
Zero Friction

You can work around the problem by programming your watch script to only run a subset of tests, but eventually, slow tests will start causing problems during integration, too. Instead of deploy providing feedback within five minutes, it will take tens of minutes, or even hours. To add insult to injury, the tests will often fail randomly, requiring you to start the long process all over again, adding friction and causing people to ignore genuine failures.

It is possible to write fast and reliable tests. It takes practice and good design, but once you know how, writing fast, reliable tests is faster and easier than writing slow, flaky tests. Here’s how:

Rely on narrow unit tests

Broad tests are written to cover large parts of the software: for example, they might launch a web browser, navigate to a URL, click buttons and enter data, then check that the browser shows the expected result. They’re sometimes called “end-to-end tests,” although technically, end-to-end tests are just one type of broad test.

Although broad tests seem like a good way to get test coverage, they’re a trap. Broad tests are slow and unreliable. You need your build to run hundreds or thousands of tests per second, and to do so with perfect reliability. The way to do so is narrow tests.

A narrow test is focused on a small amount of code. Usually a method or function, or several, in a particular class or module. Sometimes, a narrow test will focus on a small cross-cutting behavior that involves several modules.

The best types of narrow tests are called unit tests in the Agile community, although there’s some disagreement over the exact definition, as “Other Unit Test Definitions” on p.XX discusses. The important part is that unit tests are fast and deterministic. This usually requires that the test run entirely in memory.

The vast majority of your tests should be unit tests. They’re fast and reliable. The size of your unit test code should be proportional to the size of your production code. The ratios vary, but it will often be close to 1:1.

Creating unit tests requires good design. If you have trouble writing them, it could be a sign of problems in your design. Look for ways to decouple your code so that each class or module can be tested independently.

Test outside interactions with narrow integration tests

Unit tests usually test code that’s in memory, but your software doesn’t operate entirely in memory. It also has to talk to the outside world. To test code that does so, use narrow integration tests, also known as focused integration tests.

Conceptually, narrow integration tests are just like unit tests. You create them in the same way, using TDD. In practice, because they involve the outside world, narrow integration tests tend to involve a lot of complicated setup and teardown. They’re also much slower than unit tests. Where unit tests can run at a rate of hundreds or thousands per second, narrow integration tests typically run at a rate of dozens per second.

Design your code to minimize the number of narrow integration tests you need. For example, if your code depends on a third-party service, don’t call the service directly from the code that needs it. Instead, create an infrastructure wrapper: a class or module that encapsulates the service and its network calls. Use the infrastructure wrapper in the rest of your code. “Third-Party Components” on p.XX has more about wrappers and the “Application Infrastructure” episode of [Shore 2020b] has an example.

You should end up with a relatively small number of narrow integration tests, proportional to the number of external systems your code interacts with.

Simulate non-local dependencies

Some dependencies are too difficult or expensive to run locally on your development machine. You still need to be able to run your tests locally, though, for both reproducibility and speed.

To solve this problem, start by creating an infrastructure wrapper for the dependency, as normal. Then write your narrow integration test to simulate the dependency rather than having the infrastructure wrapper call it for real. For example, if your code uses a billing service with a REST API, you would write a small HTTP server to stand in for the billing service in your tests. See the “Spy Server” pattern in [Shore 2018] for details, and the “Microservice Clients Without Mocks” episodes of [Shore 2020b] for an example.

Ally
Build for Operation

This raises the question: if you don’t test your software against its real dependencies, how do you know that it works? Because external systems can change or fail at any time, the real answer is “monitoring.” (See “Paranoic Telemetry” on p.XX.) But some teams also use contract tests [Fowler 2011] to detect changes to providers’ services. These work best when the provider commits to running the tests themself.

Control global state

Any tests that deal with global state need careful thought. That includes global variables, such as static (class) variables and singletons; external data stores and systems, such as file systems, databases, and services; and machine-specific state and functions, such as the system clock, locale, time zone, and random number generator.

Tests are often written to assume that global state will be set in a certain way. Most of the time, it will be. But once in a while, it isn’t, often due to a race condition, and the test fails for no apparent reason. When you run it again, the test passes. The result is a flaky test: a test that works most of the time, but occasionally fails randomly.

Flaky tests are insidious. Because re-running the test “fixes” the problem, people learn that the right way to deal with flaky tests is to just run them again. Once you’ve accumulated hundreds of flaky tests, your test suite requires multiple runs before it succeeds. By that time, fixing the problem takes a lot of work.

When you encounter a flaky test, fix it the same day.

When you encounter a flaky test, fix it the same day. Flaky tests are the result of poor design. The sooner you fix them, the less problems you’ll have in the future.

The design flaw at the root of flaky tests is allowing global state to pollute your code. Some global state, such as static variables and singletons, can be removed through careful design. Other sorts of global state, such as the system clock and external data, can’t be avoided, but it can be carefully controlled. Use an infrastructure wrapper to abstract it away from the rest of your codebase, and test-drive it with narrow integration tests.

For example, if your code needs to interact with the system clock—for example, to time out a request, or to get the current date—create a wrapper for the system clock and use that, rather than the actual system clock, in the rest of your code. The “No More Flaky Clock Tests” episode of [Shore 2020b] has an example.

Write sociable tests

Tests can be solitary or sociable.1 A solitary test is programmed so that all dependencies of the code under test are replaced with special test code called a “test double,” also known as a “mock.” (Technically, a “mock” is a specific type of test double, but the terms are often used interchangeably.)

1The terms “sociable” and “solitary” come from Jay Fields. [Fields 2015]

Solitary tests allow you to test that your code under test calls its dependencies, but they don’t allow you to test that the dependencies work the way your code expects them to. The test doesn’t actually run the dependencies; it runs the test double instead. So if you ever make a change to a dependency that breaks the expectations of any code that uses it, your tests will continue to pass, and you’ll have accidentally introduced a bug.

To prevent this problem, people who write solitary tests also write broad tests to make sure that everything works together correctly. This is duplicated effort, and those broad tests are often slow and flaky.

A better approach, in my opinion—although the community is divided on this point—is to use sociable tests rather than solitary tests. A sociable test runs the code under test without replacing its dependencies. The code uses its actual dependencies when in runs, which means that the tests fail if the dependencies don’t work the way the code under test expects. Figure “Solitary and Sociable Tests” illustrates the difference.

A figure in two parts. Part A is labelled “Solitary tests.” It shows a series of relationships: “A” relies on “B,” which relies on “C.” Each of A, B, and C have a test, and each has a mock that the test uses. Circles show that A, B, and C are each tested, but X’s show that the relationship between A and B, and between B and C, is not tested. Part B of the figure is labelled “Sociable tests.” It shows the same tests and relationships as part A, but it doesn’t have any mocks. The figure uses circles to show that the test of A also tests A’s relationship with B, and the test of B also tests B’s relationship with C. As a result, there are no gaps that aren’t tested.

Figure 2. Solitary and sociable tests

The best unit tests—again, in my opinion—are narrow, sociable tests. They’re narrow in that the test is only testing the class or module under test. They’re sociable in that the code under test still calls its real dependencies. The result is fast tests that provide full confidence that your code works as expected, without requiring the overhead and waste of additional broad tests.

This does raise the question: how do you prevent sociable tests from talking to the outside world? A big part of the answer is to design your code to separate infrastructure and logic, as I’ll explain in a moment. The other part is to program your infrastructure wrappers to be able to isolate themselves from the outside world. My “Testing Without Mocks” article [Shore 2018] catalogs design patterns for doing so, and [Shore 2020b] has extensive examples.

Separate infrastructure and logic

Code that is pure logic, with no dependencies on anything that involves the outside world is, by far, the easiest code to test. So, to make your tests faster and more reliable, separate your logic from your infrastructure. As it turns out, this is a good way to keep your design clean, too.

There are a variety of ways to keep infrastructure and logic separate. Alistair Cockburn’s “Hexagonal Architecture” [Cockburn 2008], Gary Bernstein’s “Functional Core, Imperative Shell“ [Bernstein 2012], and my “A-Frame Architecture” [Shore 2018] are all similar ways of tackling the problem. Generally speaking, they involve modifying your code so that logic code is “pure” and doesn’t depend on infrastructure code.

In the case of A-Frame Architecture, this involves a top-level “application” layer that coordinates “logic” and “infrastructure” layers that have no awareness of each other. This is a simplified example of code you might find in the application layer:

let input = infrastructure.readData();
let output = logic.processInput(input);
infrastructure.writeData(output);

For a full example, see [Shore 2020b]. It uses A-Frame Architecture starting with episode 2 (“Application Infrastructure”), although it’s mostly in the background.

Use broad tests only as a safety net
If you use TDD correctly, broad tests shouldn’t be needed.

If you use TDD, unit tests, narrow integration tests, and sociable tests correctly, your code should be thoroughly covered. Broad tests shouldn’t be needed.

For safety, though, it’s okay to augment your test suite with additional broad tests. I typically write a small number of smoke tests. Smoke tests are broad tests that confirm that your software doesn’t go up in flames when you run it. They’re not comprehensive—they only test your most common scenarios. Your narrow tests are for comprehensive testing.

Broad tests tend to be very slow, often requiring seconds per test, and are difficult to make reliable. You should only need a handful of them.

Ally
Root-Cause Analysis
No Bugs

If you didn’t build your software with TDD from the beginning, or if you’re not confident in your ability to use TDD correctly, it’s okay to have more broad tests for safety. But do treat them only as a safety net. If they ever catch an error that your narrow tests don’t, that’s a sign of a problem with your testing strategy. Figure out what went wrong, fix the missing test, and change your testing approach to prevent further gaps.

Eventually, you’ll have confidence in your test suite and can reduce the number of broad tests to a minimum.

Adding Tests to Existing Code

Sometimes you have to add tests to existing code. Either the code won’t have any tests at all, or it will have broad, flaky tests that need to be replaced.

There’s a chicken-and-egg problem with adding tests to code. Good tests—that is, narrow tests—need to poke into your code to set up dependencies and validate state. Unless your code was written with testability in mind—and non-TDD’d code almost never is—you won’t be able to write good tests.

So you need to refactor. The problem is, in a complex codebase, refactoring is dangerous. Side effects lurk behind every function. Twists of logic wait to trip you up. In short, if you refactor, you’re likely to break something without realizing it.

So you need tests. But to test, you need to refactor. But to refactor, you need tests. Etc., etc., argh.

To break the chicken-and-egg dilemma, you need confidence that your refactorings are safe: that they cannot change the behavior of the code. Luckily, modern IDEs have automated refactorings, and, depending on your language and IDE, they might be guaranteed to be safe. According to Arlo Belshee, the core six safe refactorings you need are Rename, Inline, Extract Method/Function, Introduce Local Variable, Introduce Parameter, and Introduce Field. His article, “The Core 6 Refactorings” [Belshee 2016b], is well worth reading.

If you don’t have guaranteed-safe refactorings, you can also use characterization tests, also known as pinning tests or approval tests. Characterization tests are temporary, broad tests that are designed to exhaustively test every behavior of the code you’re changing. Llewellyn Falco’s “Approvals” testing framework, available on GitHub at https://github.com/approvals, is a powerful tool for creating these tests. Emily Bache’s video demonstration of the “Gilded Rose” kata [Bache 2018] is an excellent example of how to use approval tests to refactor unfamiliar code.

Once you have the ability to refactor safely, you can change the code to make it cleaner. Work in very small steps, focusing on Belshee’s core six refactorings, and running your tests after each step. Simplify and refine the code until one part of it is testable, then add narrow tests to that part. You may need to write solitary tests rather than sociable tests, to begin with.

Continue refining, improving, and testing until all the code you’re working on is covered by high-quality narrow tests. Once it is, you can delete the characterization tests and any other broad tests of that code.

Questions

Isn’t TDD wasteful?

I go faster with TDD than without it. With enough practice, I think you will too.

TDD is faster because programming doesn’t just involve typing at the keyboard. It also involves debugging, manually running the code, checking that a change worked, and so forth. Michael “GeePaw” Hill calls this activity GAK, for “Geek At Keyboard.” With TDD, you spend much less time GAKking around and more time doing fun programming work. You also spend less time studying code, because the tests act as documentation and inform you when you make mistakes. Even though tests take time to write, the net result is that you have more time for development, not less. GeePaw Hill’s video, “TDD & The Lump of Coding Fallacy” [Hill 2018], is an excellent and entertaining explanation of this phenomenon.

TDD does take time to learn and apply, especially to new UI technologies and existing code. It’s worth it, but it can take a few months before it’s a net positive.

What do I need to test when using TDD?

The saying is, “Test everything that can possibly break.” To determine if something could possibly break, I think, “Do I have confidence that I’m doing this correctly, and that nobody in the future will inadvertently break this code?”

I’ve learned through painful experience that I can break nearly everything, so I test nearly everything. The only exception is code without any logic, such as simple getters and setters, or a function that only calls another function.

You don’t need to test third-party code unless you have some reason to distrust it. But it is a good idea to wrap third-party code in code that you control, and test that the wrapper works the way you want it to. “Third-Party Components” on p.XX has more about wrapping third-party code.

How do I test private methods?

Start by testing public methods. As you refactor, some of that code will move into private methods, but it will still be covered by existing tests.

If your code is so complex that you need to test a private method directly, this is a good indication that you should refactor. You can move the private function into a separate module or method object, where it will be public, and test it directly.

How can I use TDD when developing a user interface?

TDD is particularly difficult with user interfaces because most UI frameworks weren’t designed with testability in mind. Many people compromise by writing a very thin, untested translation layer that only forwards UI calls to a presentation layer. They keep all their UI logic in the presentation layer and use TDD on that layer as normal.

There are tools that allow you to test a UI directly, by making HTTP calls (for web-based software) or by pressing buttons and simulating window events (for client-side software). Although they’re usually used for broad tests, my preference is to use them to write narrow integration tests of my UI translation layer.

Should we refactor our test code?

Absolutely. Tests have to be maintained, too. I’ve seen otherwise-fine codebases go off the rails because of brittle and fragile test suites.

That said, tests are a form of documentation and should generally read like a step-by-step recipe. Loops and logic should be moved into helper functions that make the underlying intent of the test easier to understand. Across each test, though, it’s okay to have some duplication if it makes the intent of the test more clear. Unlike production code, tests are read much more often than they’re modified.

Arlo Belshee uses the acronym “WET,” for “Write Explicit Tests,” as a guiding principle for test design. It’s in contrast with the DRY (Don’t Repeat Yourself) principle used for production code. His article on test design, “WET: When DRY Doesn’t Apply,” is superb. [Belshee 2016a]

How much code coverage should we have?

Measuring code coverage is often a mistake. Rather than focusing on code coverage, focus on taking small steps and using your tests to drive your code. If you do this, everything you want to test, should be tested.

Although code coverage can be useful to identify gaps in tests, and it’s particularly useful for approval testing, as [Bache 2018] demonstrates, the coverage percentage isn’t important. Too often, code coverage distracts people from the core issues they’re facing. If folks are having trouble with the TDD techniques, code quality, test adoption, or something else, the solution is to address those issues head on, not to measure coverage. “Code Coverage” on p.XX discusses how to identify the underlying issue and what to do instead.

Prerequisites

Although TDD is a very valuable tool, it does have a two- or three-month learning curve. It’s easy to apply to toy problems such as the ROT-13 example, but translating that experience to larger systems takes time. Legacy code, proper test isolation, and narrow integration tests are particularly difficult to master. On the other hand, the sooner you start using TDD, the sooner you’ll figure it out, so don’t let these challenges stop you.

Because TDD has a learning curve, be careful about adopting it without permission. Your organization could see the initial slowdown and reject TDD without proper consideration. Similarly, be cautious about being the only one to use TDD on your team. It’s best if everyone agrees to use it together, otherwise you’re likely to end up with other members of the team inadvertently breaking your tests and creating test-unfriendly code.

Once you do adopt TDD, don’t continue to ask permission to write tests. They’re a normal part of development. When sizing stories, include the time required for testing in your size considerations.

Ally
Zero Friction

Fast feedback is crucial for TDD to be successful. Make sure you can get feedback within 1-5 seconds, at least for the subset of tests you’re currently working on.

Finally, don’t let your tests become a straightjacket. If you can’t refactor your code without breaking a lot of tests, something is wrong. Often, it’s a result of overzealous use of test doubles. Ask a mentor for help.

Indicators

When you use TDD well:

  • You spend little time debugging.

  • You continue to make programming mistakes, but you find them in a matter of minutes and can fix them easily.

  • You have total confidence that the whole codebase does what programmers intended it to do.

  • You aggressively refactor at every opportunity, confident in the knowledge that the tests will catch any mistakes.

Alternatives and Experiments

TDD is at the heart of the Delivering practices. Without it, Delivering fluency will be difficult or even impossible to achieve.

A common misinterpretation of TDD, as “Test-Driven Debaclement” on p.XX illustrates, is to design your code first, write all the tests, and then write the production code. This approach is frustrating and slow, and it doesn’t allow you to learn as you go.

Another approach is to write tests after writing the production code. This is very difficult to do well: the code has to be designed for testability, and it’s hard to do so unless you write the tests first. It’s also tedious, with a constant temptation to wrap up and move on. In practice, I’ve yet to see after-the-fact tests come close to the detail and quality of tests created with TDD.

Even if these approaches do work for you, TDD isn’t just about testing. It’s really about using very small, continuously-validated hypotheses to confirm that you’re on the right track and produce high-quality code. With the exception of Kent Beck’s TCR, which I’ll discuss in a moment, I’m not aware of any alternatives to TDD that allow you to do so while also providing the documentation and safety of a good test suite.

Under the TDD banner, though, there are many, many experiments that you can conduct. TDD is one of those “moments to learn, lifetime to master” skills. One of the biggest opportunities for experimentation is between the “classicist” or “mockist” approach. In this book, I’ve shown how to apply the classicist approach. The mockist approach, spearheaded by Steve Freeman and Nat Pryce, is also worth investigating. Their book, Growing Object-Oriented Software, Guided by Tests, is well worth reading. [Freeman and Pryce 2010]

More recently, Kent Beck has been experimenting with an idea he calls TCR: test && commit || revert. [Beck 2018] It refers to a small script that automatically commits your code if the tests pass, and reverts it if the tests fail. Although TCR sacrifices the “red bar” step of TDD, which a lot of people like, it forces you to take very small steps. This gives you the same series of validated hypotheses that TDD does, and arguably makes them even smaller and more frequent. That’s one of the hardest and most important things to learn about TDD. TCR is worth trying as an exercise, if nothing else.

Further Reading

This book only scratches the surface of TDD. For more detail about the approach I recommend here, see my “Testing Without Mocks” article [Shore 2018] and accompanying “TDD Lunch and Learn” video series [Shore 2020b].

Test-Driven Development: By Example [Beck 2002] is an excellent introduction to TDD by the person who invented it. If you liked the ROT-13 example, you’ll like the extended examples in this book. The TDD patterns in Part III are particularly good.

Working Effectively with Legacy Code [Feathers 2004] is a must-have for anybody working with legacy code.

XXX Consider Astels, Rainsberger, 1ed p302

XXX Consider Jay Fields Working Effectively With Unit Tests https://leanpub.com/wewut

XXX Avi Kessner recommends: Art of Unit Testing, Third Edition (Roy Osherove)

XXX Bill Wake recommends: Gerard Meszaros xUnit Test Patterns

XXX Reuven Yagel recommends: "Test-Driven Development: Extensive Tutorial" by Grzegorz Gałęzowski. https://leanpub.com/tdd-ebook https://github.com/grzesiek-galezowski/tdd-ebook

XXX Bas Vodde recommends:

  • Freeman/Pyrce - GOOSE

  • Koskela - Test-Driven

  • Madeyski - Test-Driven Development: An Empirical Evaluation of Agile Practice (though, I wouldn’t recommend this for people learning TDD)

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Zero Friction

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Zero Friction

Audience
Programmers, Operations

When we’re ready to code, nothing gets in our way.

Imagine you’ve just started working with a new team. One of your new teammates, Pedro, walks you over to a development workstation.

“Since you’re new, we’ll start by deploying a small change,” he says, sitting down next to you. “This machine is brand-new, so we’ll have to set it up from scratch. First, clone the repo.” He tells you the command. “Now, run the build script.”

Commands start scrolling up the screen. Pedro explains. “We use a tool for reproducible builds. It uses a configuration file in the repo to make sure we all have the same tooling installed. Right now, it’s detected that you don’t have anything installed, so it’s installing the IDE, development tools, and images needed to develop and run the system locally.”

“This will take a while,” he continues. “After the first run, though, it’s instantaneous. It only updates again when we commit changes to the config. Come on, I’ll show you around the office.”

When you come back, the build is done. “Okay, let me show you the app,” Pedro says. “Type rundev to start it up.” Once again, information starts scrolling by. “This is all running locally,” Pedro explains proudly. “We used to have a shared test environment, and we were constantly stepping on each others’ toes. Now that’s all in the past. It even knows which services to restart depending on which files you change.”

Pedro walks you through the application. “Now let’s make a change,” he says. “Open up the IDE and run the watch script with the quick command. It will run the build when files change. The quick command tells it to only build and test the files that have changed.”

You follow his instructions and the script starts up, then immediately reports BUILD OK in green. “Nothing’s changed since we last ran the build,” Pedro explains, “so the script didn’t do anything. Now, let’s make a small change.” He directs you to a test file and has you add a test. When you save the changes, the watch script runs again and reports a test failure. It takes less than a second.

“We’ve put a lot of work into our build and test speed,” Pietro tells you. He’s clearly proud of it. “It wasn’t easy, but it’s totally worth it. We get feedback on most changes in a second or two. It’s done wonders for our ability to iterate and be productive. I’m not lying when I say this is the best development environment I’ve ever been in.”

“Now let’s finish up this change and deploy.” He shows you the production change needed to get the new test pass. Once again, when you save, the watch script runs the tests in about a second. This time, it reports success.

“Okay, we’re ready to deploy,” he says. “This is going into production, but don’t worry. The deploy script will run the full test suite, and we also have a canary server that checks to see if anything goes wrong. Type deploy to kick things off.”

You run the script and watch it go through its paces. A few minutes later, it says INTEGRATION OK, then starts deploying the code. “That’s it!” Pedro beams. “Once the integration succeeds, you can assume the deploy will too. If something goes wrong, we’ll get paged. Welcome to the team!”

It’s been less than an hour, and you’ve already deployed to production. This is zero-friction development: when you’re ready to code, nothing gets in your way.

One-Second Feedback

When you make a change, you need to get feedback in less than five seconds.

Development speed is the most important area for eliminating friction. When you make a change, you need to get feedback about that change in less than five seconds. Less than one second is best. Ten seconds at the very most.

This type of fast feedback is a game changer. You’re able to experiment and iterate so easily. Rather than making big changes, you can work in very small steps. Each change can be a line or two of code, which means that you always know where your mistakes are. Debugging becomes a thing of the past.

If feedback takes less than a second, it’s functionally instantaneous. You’ll make a change, see the feedback, and keep working. If it takes between one and five seconds, it won’t feel instantaneous, but it’s still acceptable. If it takes between five and ten seconds, it will feel slow. You’ll start being tempted to batch up changes. And if it’s more than ten seconds, you won’t be able to take small steps, and that will slow you down.

Ally
Test-Driven Development

To achieve one-second feedback, set up a watch script that automatically checks your code when you make a change. Inside the script, use a compiler or linter to tell you when you make syntax errors, and tests to tell you when you make semantic errors.

Alternatively, you can configure your IDE to check syntax and run tests, rather than writing a script. This can be an easy way to get started, although you’ll have to migrate to a script eventually. If you do start with an IDE-based approach, make sure its configuration can be committed to your repository and used by everyone on the team. You need the ability to share improvements easily.

When you save your changes, the script (or IDE) should give you immediate, unambiguous feedback. If everything worked, it should say OK. If anything failed, it should say FAILED, and provide information to help you troubleshoot the error. Most people make their tools display a green bar for success and a red bar for failure. I also program mine to play a sound—one for compile/lint failure, another for test failure, and a third for success—but that’s just me.

As your codebase gets larger, one-second feedback will become harder to achieve. The first culprit is usually test speed. Instead of writing broad tests that check the whole system, write narrow tests that focus on the behavior of a small amount of code. Stub out slow and brittle parts of the system, such as file system, network, and database access. “Fast and Reliable Tests” on p.XX describes how.

As your system continues to grow, build speeds (compiling or linting) will become a problem. The solution will depend on your language. A web search for “speed up <language> build” will get you started. Typically, it will involve incremental builds: caching parts of the build so that only code that has changed gets rebuilt. The larger your system gets, the more creative you’ll have to be.

Ally
Continuous Integration

Eventually, you’ll probably need to set up two builds: one for fast feedback, and one for production deployment. Although it’s preferable for your local build to be the same as your production build, fast feedback is more important. Your deploy script can run your tests against the production build. As long as you have a good test suite and practice continuous integration, you’ll learn about discrepancies between the two builds before they’ve had a chance to get out of control.

Although good tests run at a rate of hundreds or thousands per second, you’ll eventually have too many tests to run them all in less than a second. When you do, you’ll need to revise your script to only run a subset of the tests. The easiest way is to group your tests into clusters, and run specific clusters based on the files that have changed.

Eventually, you may want to do a more sophisticated dependency analysis that detects exactly which tests to run for any given change. Some test runners can do this for you. It’s also not as hard to implement as you might think. The trick is to focus on what your team needs rather than making a generic solution that handles all possible edge cases.

Know Your Editor

Don’t let your code editor get in the way of your thoughts. This is particularly important when pairing or mobbing; when you’re navigating, there are few things more frustrating than watching a driver struggle with the editor.

Take the time to get to know your editor really, really well. If the editor provides automated refactorings, learn how to use them. (If it doesn’t, look for a better editor.) Take advantage of auto-formatting, and commit the formatting configuration file to your repository so your whole team is in sync. Learn how to use code completion, automatic fixes, function and method lookup, and reference navigation. Learn the keyboard shortcuts.

For an example of how much of a difference editor proficiency can make, see Emily Bache’s virtuoso performance in her Gilded Rose videos, particularly part 2. [Bache 2018]

Reproducible Builds

It worked on my machine!

Overheard

What happens when you check out an arbitrary commit from your repository? Say, from a year ago. (Go on, try it!) Does it still run? Do the tests still pass? Or does it require some esoteric combination of tooling and external services that have long since passed into oblivion?

You should be able to check out any commit and expect it to work the same for every developer.

A reproducible build is a build that continues to work and pass its tests no matter which development machine you use to build it, and no matter how old the code you’re building is. You should be able to check out any commit and expect it to work the same way for every developer. Generally speaking, this requires two things:

1. Dependency Management

Dependencies are the libraries and tools your code requires to run. This includes your compiler or interpreter, run-time environment, packages downloaded from your language’s package repository, code created by other teams in your organization, and so forth. For your build to be reproducible, everybody needs to have the exact same dependencies.

Program your build to ensure that you have the correct version of every dependency. It can either exit with an error when the wrong version is installed, or (preferably) automatically install the correct version. Tools to do so include Nix, Bazel, and Docker. Check the version of your dependency management tool, too.

An easy way to ensure your software has the correct dependencies is to check them into your repository. This is called vendoring. You can mix the two approaches: for example, a team with a Node.js codebase vendored its node_modules directory, but didn’t vendor the Node executable. Instead, they programmed the build to fail if the wrong version of Node was running.

2. Local Builds

Dependency management will ensure that your code runs the same way on every machine, but it won’t ensure that your tests pass. Your tests need to run entirely locally, without communicating over the network. Otherwise, you’re likely to get inconsistent results when two people run the tests at the same time, and you won’t be able to build old versions. The services and data they depend on will have changed, and tests that used to pass will fail.

The same is true for when you run the code manually. To get consistent results and to be able to run old versions, everything the code depends on needs to be installed locally.

There may be some dependencies you can’t run locally. If so, you need to program your tests to run independently of those dependencies, or you won’t be able to reproduce your test results in the future. “Simulate Non-Local Dependencies” on p.XX describes how.

Five-Minute Integration and Deploy

Allies
Continuous Integration
Continuous Deployment

If you use continuous integration or continuous deployment, you’ll integrate or deploy several times per day. This process needs to be bulletproof, and fast. That means scripting it. Your script should report success or failure within five minutes—ten at most.

Five-minute results are surprisingly important. Five minutes is enough for a stretch break and a new cup of coffee while you keep an eye on the results. Ten minutes is tolerable, but gets tedious. More than that, and people will start working on other tasks before the results are in. Then, when the integration or deploy fails, the code will be left in limbo until somebody gets back to it. In practice, this leads to systemic integration and build problems.

The script doesn’t need to literally complete within five minutes, although that’s preferable. Instead, it needs to validate the code, and report success or failure, before performing longer-running checks. “Multistage Integration Builds” on p.XX explains how it works.

For most teams, the thing standing between them and a five-minute integration or deploy is the speed and reliability of their test suite. “Fast and Reliable Tests” on p.XX explains how to fix both problems.

Control Complexity

Ally
Simple Design

An oft-overlooked source of friction for development teams is the complexity of their development environment. In their rush to get work done quickly, teams pull in popular tools, libraries, and frameworks to solve common development problems.

There’s nothing wrong with these tools, in isolation. But any long-lived software development effort is going to have specialized needs, and that’s where the quick and easy approach starts to break down. All those tools, libraries, and frameworks add up to an enormous cognitive burden, especially when you have to start diving into their internals to make them work together nicely. That ends up causing a lot of friction.

It’s more important to optimize maintenance costs than initial development, as “Key Idea: Optimize for Maintenance” on p.XX explains. Be thoughtful about the third-party dependencies you use. When you choose one, don’t just think about the problem it’s solving; think about the maintenance burden the dependency will add, and how well it will play with your existing systems. A simple tool or library your scripts can call is a great choice. A complex black box that wants to own the world probably isn’t.

In most cases, it’s best to wrap the third-party tool or library in code you control. The job of your code is to hide the underlying complexity and present a simple interface customized for your needs. “Third-Party Components” on p.XX explains further.

Automate Everything

Automate every activity that your team performs repeatedly. Not only will this decrease friction, it will decrease errors, too. To begin with, this means five scripts:

  • build: compile and/or lint, run tests, and report success or failure

  • watch: automatically run build when files change

  • integrate: run build in a production-like environment and integrate your code

  • deploy: run integrate, then deploy the integration branch

  • rundev: run the software locally for manual review and testing

You’re free to use whichever names you prefer, of course.

Use a real programming language for your scripts. Your scripts can call out to tools, and some of those tools might have their own proprietary configuration languages, but orchestrate them all with real code that you control. As your automation becomes more sophisticated, you’ll appreciate the power a real programming language provides.

Treat your scripts with the same respect as real production code. You don’t have to write tests for them—scripts can be very hard to test—but do pay attention to making your scripts well-written, well-factored, and easy to understand. You’ll thank yourself later.

Consider instrumenting your scripts and tools to collect performance data. At a minimum, program your scripts to report how long they take to run, so you can see when it’s time to optimize. More detail, while optional, will help you understand where to focus your efforts.

For example, I had a watch script that had crept past the five second barrier. I assumed the problem was test speed, but when I instrumented it, it turned out to be due to build startup costs—loading libraries, scanning the file system, and dependency checks. I changed watch to run the build in-process, incurring the startup costs only once, and that saved several seconds.

Speaking of watch, it should run the build when files in your source tree are added, removed, or changed. Make sure it handles changes to the watch and build scripts themselves, too. The best way to detect file changes depends on your scripting language, but somebody’s probably written a library you can use. Try searching the web for “<language> watch for file changes.”

You may be tempted to use your IDE instead of a watch script. That’s okay, to start with, but you’ll still need to automate your build for the integrate script, so you could end up maintaining two separate builds. Beware of lock-in, too: eventually, the IDE won’t be able to provide one-second feedback. When that happens, rather than fighting the IDE, switch to a proper script-based approach. It’s more flexible.

Automate Incrementally

Improve your automation continuously and incrementally, starting with your very first story. In a brand-new codebase, that means that your first development tasks are to set up your scripts.

Keep your automation simple. In the beginning, you don’t need sophisticated incremental builds or dependency graph analysis. Before you write any code, start by writing a build script that simply says BUILD OK. Nothing else! It’s like a “hello world” for your build. Then write a watch script that does nothing but run build when files change.

When build and watch are working, create a similarly bare-bones integrate script. At first, it just needs to run build in a pristine environment and integrate your code. There are many tools that will do this for you, typically under the name “continuous integration server” or “build server.” Be sure to get one that integrates after the build succeeds, not before. It’s also surprisingly easy to roll your own; “Continuous Integration Without a CI Server” on p.XX describes how.

When integrate is working, you’re ready to flesh out build. Write a do-nothing entry point for your application. Maybe it just says “Hello world.” Make build compile or lint it, then add dependency management for the compiler or linter. It can just check the version against a constant, to start with, or you can install a dependency management tool. Alternatively, you can vendor your dependencies.

Next, add a unit testing tool and a failing test. Be sure to add dependency management for the testing tool too. Make the build run the test, fail appropriately, and exit with an error code. Next, check that watch and integrate both handle failures correctly, then make the test pass.

Now you can add the rundev script. Make rundev compile (if needed) and run your do-nothing application, then make it recompile and rerun when the source files change. Refactor so that build, watch, and rundev don’t have duplicated file-watching or compilation code.

Ally
Continuous Deployment

Finally, create deploy. Have it run integrate—don’t forget to handle failures—and then deploy the integration branch. Start by deploying to a staging server. The right way to do so depends on your system architecture, but you only have one production file, so you don’t need to do anything complicated. Just deploy that one file and its runtime environment to one server. It can be as simple as using scp or rsync. Anything more complicated—crash handling, monitoring, provisioning—needs a story. (For example, “Site keeps working after crash.”) As your system grows, your automation will grow with it.

If you don’t deploy to a server, but instead distribute installation packages, make deploy build a simple distribution package. Start with a bare-bones package, such as a .zip file, that just contains your one production file and its runtime. Fancier and more user-friendly installation can be scheduled with user stories.

You should be able to pull any commit and expect it to work the same for every developer.

From this point forward, update your automation with every story. When you add dependencies, don’t install them manually (unless you vendor them); add them to your dependency manager’s configuration and let it install them. That way, you know it will work for other people too. When a story first involves a database, update build, rundev, and deploy to automatically install, configure, and deploy it. Same for stories that involve additional services, servers, and so forth.

When written out in this way, automation sounds like a lot of work. But when you build your automation incrementally, you start simple and grow your automation along with the rest of your code. Each improvement is only a day or two of work, at most, and most of your time is focused on your production code.

Automating Legacy Code

You may not have the luxury of growing your automation alongside your code. Often, you’ll add automation to an existing codebase instead.

Start by create empty build, rundev, integrate, and deploy scripts. Don’t automate anything yet; just find the documentation for each of these tasks and copy it into the corresponding script. For example, the rundev script might say “1. Run `esoteric_command` 2. Load `https://obscure_web_page`,” and so forth. Wait for a keypress after each step.

Ally
Slack

Such simple automation shouldn’t take long, so you can create each script as part of your slack. When you create each one, the script becomes your new, version-controlled source of truth. Either remove the old documentation or change it to describe how to run the script.

Next, use your slack to gradually automate each step. Start with the low-hanging fruit and automate the easiest steps first, then focus on the steps that introduce the most friction. For a while, your scripts will have a mix of automation and step-by-step instructions. Keep going until the scripts are fully automated, then start looking for opportunities to further improve and simplify.

When build is fully automated, you’ll probably find that it’s too slow for one-second feedback (or even ten-second feedback). Eventually, you’ll want to have a sophisticated incremental approach, but you can start by identifying small chunks of your codebase. Provide build targets that allow you to build and test each one in isolation. The more finely you chop up the chunks, the easier it will be to get below the ten-second threshold.

Once a commonly-used build target is below ten seconds, it’s fast enough to be worth creating a watch script. Continue optimizing, using your slack to improve a bit at a time, until you get all the targets below five seconds. At some point, modify the build to automatically choose targets based on what’s changed.

Next, improve your deployment speed and reliability. This will probably require improving the tests, so it will take a while. As before, use your slack to improve a piece at a time. When a test fails randomly, make it deterministic. When you’re slowed down by a broad test, replace it with narrow tests. “Adding Tests to Existing Code” on p.XX explains what to do.

The code will never be perfect, but eventually, the parts you work with most frequently will be polished smooth. Continue using your slack to make improvements whenever you encounter friction.

Questions

How do we find time to automate?

Ally
The Planning Game
Done Done

The same way you find time for coding and testing: it’s simply part of the work to be done. During the planning game, when you size each story, include any automation changes the story needs.

Similarly, use your slack to make improvements when you encounter friction. But remember that slack is for extra improvement. If a story requires automation changes, building the automation—and leaving the scripts you touched at least a bit better than you found them—is part of developing the story, not part of your slack. The story’s not done until the automation is too.

Who’s responsible for writing and maintaining the scripts?

Ally
Collective Code Ownership

They’re collectively owned by the whole team. In practice, team members with programming and operations skills take responsibility for them.

We have another team that’s responsible for build and deployment automation. What should we do?

Treat their automation in the same way you treat any third-party dependency. Encapsulate their tools behind scripts you control. That will give you the ability to customize as needed.

When does database migration happen?

It’s part of your deployment, but it may happen after the deployment is complete. See “Continuous Deployment” on p.XX for details.1

1XXX update with specific reference when CD practice done.

Prerequisites

Every team can work on reducing friction. Some languages make fast feedback more difficult, but you can usually get meaningful feedback about the specific part of the system you’re currently working on, even if that means running a small subset of your tests. Fast feedback is so valuable, it’s worth taking the time to figure it out.

Your ability to run the software locally may depend on your organization’s priorities. In a multi-team environment, it’s easy to accidentally create a system that can’t be run locally. If that’s the case for you, you can still program your tests to run locally, but running the whole system manually might be out of your control.

Indicators

When your team has zero-friction development:

  • You spend your time developing, not struggling with tools, checklists, and dependency documentation.

  • You’re able to work in very small steps, which allows you to catch errors earlier and spend less time debugging.

  • Setting up a new development workstation is a simple matter of cloning the repository and running a script.

  • You’re able to integrate and deploy multiple times per day.

Alternatives and Experiments

Zero-friction development is an ideal that every team should strive for. The best way to do it depends on your situation, so feel free to experiment.

Some teams rely on their IDE, rather than scripting, to provide the automation they need. Others use large “kitchen-sink” tools with complicated configuration languages. I find that these approaches tend to break down as the needs of the team grow. They can be a convenient way to get started, but when you outgrow them, switching tends to be painful and difficult to do incrementally. Be skeptical when evaluating complicated tools that promise to solve all your automation needs.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Chapter: Collaboration (Introduction)

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Collaboration

In addition to the teamwork expected of any Agile team (see chapter “Teamwork”), Delivering teams also have high standards of technical excellence and collaboration. They’re expected to work together, as a team, to keep internal quality high and deliver their most important business priority.

These practices will help your team collaborate:

  • “Collective Code Ownership” on p.XX allows team members to improve each other's code.

  • “Pair Programming” on p.XX cross-pollinates ideas and helps the team maintain awareness of how everything fits together.

  • “Mob Programming” on p.XX gets the whole team working together.

  • “Ubiquitous Language” on p.XX helps team members understand each other.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Ubiquitous Language

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Ubiquitous Language

Audience
Programmers

Our whole team understands each other.

Try describing the business logic in your current system to a domain expert. Are you able to explain how the system works in terms they understand? Can you avoid programming jargon, such as the names of design patterns, frameworks, or coding styles? Is your domain expert able to identify potential problems in your business logic?

If not, you need a ubiquitous language. It’s a way of unifying the terms your team uses in conversation and code so that everybody can collaborate effectively.

The Domain Expertise Conundrum

One of the challenges of professional software development is that programmers usually aren’t experts in the software’s problem domain. For example, I’ve helped write software that controls factory robots; directs complex financial transactions; analyzes data from scientific instruments; and performs actuarial calculations. When I started working with those teams, I knew nothing about those things.

It’s a conundrum. The people who understand the problem domain—the domain experts—are rarely qualified to write software. The people who are qualified to write software—the programmers—don’t always understand the problem domain.

The challenge is communicating clearly and accurately.

Overcoming this challenge is, fundamentally, an issue of communication. Domain experts communicate their expertise to programmers, who in turn encode that knowledge in software. The challenge is communicating that information clearly and accurately.

Speak the Same Language

Programmers should speak the language of their domain experts, not the other way around. In turn, domain experts should tell programmers when the language they’re using is incorrect or confusing.

Imagine you’re creating a piece of software for typesetting musical scores. The publishing house you’re working for provides an XML description of the music, and you need to render it properly. This is a difficult task, filled with seemingly minor stylistic choices that are vitally important to your customers.

In this situation, you could focus on XML elements, parents, children, and attributes. You could talk about device contexts, bitmaps, and glyphs. If you did, your conversation might sound something like this:

Programmer: “We were wondering how we should render this clef element. For example, if the element’s first child is “G” and the second child is “2,” but the octave-change element is “-1,” which glyph should we use? Is it a treble clef?”

Domain expert (thinking, “I have no idea what they’re talking about. But if I admit it, they’ll respond with something even more confusing. I’d better fake it.”) “Um... sure, G, that’s treble. Good work.”

Instead, focus on domain terms rather than technical terms.

Programmer: “We were wondering how we should print this “G“ clef. It’s on the second line of the staff but one octave lower. Is that a treble clef?”

Domain expert (thinking, “An easy one. Good.”) “That’s often used for tenor parts in choral music. It’s a treble clef, yes, but because it’s an octave lower we use two symbols rather than one. Here, I’ll show you an example.”

The domain expert’s answer is different in the second example because they understand the question. The conversation in the first example would have led to a bug.

How to Create a Ubiquitous Language

Ally
Customer Examples

Ubiquitous language doesn’t come automatically. You have to work at it. When you talk to domain experts, listen for the terms they use. Ask questions about their domain, sketch diagrams that model what you hear, and ask for feedback. When you get into tricky details, ask for examples.

For example, imagine you’re having your first conversation with a domain expert about the music typesetting software:

Programmer: I took piano lessons as a kid, so I know the basics of reading music. But it’s been a while. Can you walk me through it from the beginning?

Domain expert: We typeset music for ensembles and orchestras here, so it’s not exactly the same as a piano score, but your background will help. To start with the basics, every score is divided into staves, each staff is divided into measures, and notes go into the measures.

Programmer: So the score is the fundamental thing we’re typesetting?

Domain expert: That’s right.

Programmer: Got it. (Draws a box and labels it “score.”) And then each score has staves. (Adds a box labelled “staff” and draws a line connecting it to “score.”) And each staff has measures. (Adds another box labelled “measure” and connects it to “staff.”) How many staffs can the score have?

Domain expert: It depends on the arrangement. Four, for a string quartet. A dozen or more for an orchestra.

Programmer: But at least one?

Domain expert: Well, I guess so. It wouldn’t make sense for a score to have zero staves. Each instrument gets a staff, or multiple, in the case of instruments with a lot of range, like pianos and organs.

Programmer: Okay, I’m starting to get lost. Do you have an example I can look at?

Domain expert: Sure. (Pulls out example.1) Here at the top, you can see the choir. There’s a staff for each part, which you can think of being the same as an instrument: soprano, alto, tenor, and bass. And then a grand staff for the harp, a grand staff and a regular staff for the organ, and so forth.

1See http://stevensametz.com/wordpress/wp-content/pdfs/sample/thumbs/Amo%201%20Munus%20-%20SATB,%20harp,%20organ,%20pc,%20orchestra%20Score-800-0.jpg for an example of orchestral sheet music.

Programmer: (Revising sketch on whiteboard.) So we start with the score, and the score has multiple instruments, and each instrument has one or more staffs, and the staff can either be a regular staff or a grand staff. And it looks like the instruments can be grouped together too.

Domain expert: Right, I should have mentioned that. The instruments can be grouped into sections. You know, string section, horn section?

Programmer: (Revising sketch again.) Got it. Staff has sections, sections have instruments, and then the rest.

Domain expert: (Looks at diagram.) This is a start, but there’s still a lot missing. We need a clef, key, and time signature...

The result of this conversation is more than just a whiteboard sketch. It can also form the basis for a domain model in your code. Not every program needs a domain model, but if your team’s software involves a complicated domain, a domain model is a powerful way to develop using your ubiquitous language.

You’re not going to literally program in the domain experts’ language, of course. You’ll still use a programming language. But you’ll create your modules, functions, classes, and methods so that they model the way your domain experts think. By reflecting in code how users think and speak about their work, you refine your knowledge, expose gaps that would otherwise result in bugs, and create a malleable system that is responsive to the changes your users will want.

To continue the example, a program to typeset a musical score based on XML input could be designed around XML concepts. A better approach, though, might be to design it around domain concepts, as shown in figure “XML and Domain-Centric Design”.

Two class diagrams. The one on the left is labelled “XML-centric design (simplified),” and it shows the relationships between an “Entity” and an “Attribute” class. The one on the right is labelled “Domain-centric design (simplified),” and it shows the relationships between domain-oriented classes, such as “Score,” “Measure,” “Staff,” and “Note.”

Figure 1. XML and Domain-Centric Design

Code doesn’t leave room for ambiguity. This need for rigorous formalization results in more conversations and clarifies obscure details. I often see situations in which programmers run into a sticky design problem, ask their domain expert a question, and this in turn causes the domain experts to question some of their assumptions.

Your ubiquitous language, therefore, is a living language. It’s only as good as its ability to reflect reality. As you clarify points with your domain experts, encode what you’ve learned in your domain model. As the domain model reveals ambiguities, bring them back to your domain experts for clarification.

Ally
Refactoring

As you go, be sure that your design and the language you and your domain experts share remain in sync. Refactor the code when your understanding of the domain changes. If you don’t, you’ll end up with a mismatch between your design and reality, which will lead to ugly kludges and bugs.

Questions

Should we avoid the use of technical terms altogether? Our business domain doesn’t mention anything about GUI widgets or a database.

It’s okay to use technical language in areas that are unrelated to the domain. For example, it’s probably best to call a database connection a “connection” and a UI button a “button.” However, you should typically encapsulate these technical details behind a domain-centric face.

How do we document our ubiquitous language?

Ideally, you encode your ubiquitous language in the actual design of your software using a domain model. If that’s not appropriate, you can document your model on a whiteboard (possibly a virtual whiteboard), shared document, or wiki page. Be careful, though: this sort of documentation requires a lot of attention to keep up to date.

Ally
Simple Design

The advantage of using code for documentation is that code can’t help but reflect what your software really does. With care, you can design your code to be self-documenting.

Different stakeholders use different terms for the same things. How can we reconcile this?

Your ubiquitous language doesn’t need to be literally ubiquitous. The important thing is to unify the language that your programmers, domain experts, and code use. Use the same terms as the domain experts that you work with directly. If you work with multiple domain experts, and they don’t agree—which happens more often than you might expect—ask them to work together to decide which approach you should use.

We program in English, but it’s not our first language, and our domain experts don’t use English. Should we translate their terms to English for consistency with the rest of our code?

It’s up to you. Words don’t always translate directly, so using your domain expert’s literal language is likely to result in fewer errors, especially if domain experts are able to overhear and contribute to programmers’ conversations. On the other hand, consistency might make it easier for others to work with your code in the future.

If you do decide to translate your domain experts’ terms to English (or another language), create a translation dictionary for the words you use, especially for words that don’t translate perfectly.

Prerequisites

Ally
Whole Team

If you don’t have any domain experts as part of your team, you may have trouble understanding the domain deeply enough to create a ubiquitous language. Attempting to do so is even more important in this situation, though. When you do have the opportunity to speak with a domain expert, the ubiquitous language will help you to discover misunderstandings more quickly.

On the other hand, some problems are so technical that they don’t involve non-programmer domain knowledge at all. Compilers and web servers are examples of this category. If you’re building this sort of software, the language of programming is the language of the domain.

Some teams have no experience creating domain models and domain-centric designs. If this is true of your team, proceed with caution. Domain-centric designs require a shift in thinking that can be difficult. See the “Further Reading” section to get started, and consider hiring a coach to help you learn.

Indicators

When you have a ubiquitous language that works:

  • You reduce miscommunication between customers and programmers.

  • You produce code that’s easier to understand, discuss, and modify.

  • When sharing a physical team room, domain experts overhear domain and implementation discussions. They join in to resolve questions and expose hidden assumptions.

Alternatives and Experiments

It’s always a good idea to speak the language of your domain experts, but domain-centric design isn’t always the best choice. Sometimes a technology-centric design is simpler and easier. This is most often the case when your domain rules aren’t very complicated. Be careful, though: domain rules are often more complicated than they first appear, and technology-centric designs tend to have defects and high maintenance costs when that’s true. See [Fowler 2002] for further discussion of this trade-off.

Further Reading

Domain-Driven Design: Tackling Complexity in the Heart of Software [Evans 2003] is the definitive guide to creating domain-centric designs. Chapter two, “Communication and the Use of Language,” was the inspiration for this practice.

Patterns of Enterprise Application Architecture [Fowler 2002] has a good discussion of the trade-offs between domain models and other architectural approaches.

XXX Consider Object Design (Wirfs-Brock and McKean)

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Mob Programming

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Mob Programming

Audience
Whole Team

We bring the insights of the whole team to bear.

In the early days of Extreme Programming, when pair programming first became popular, people used to mock it. “If pairing is good, why not triple!” they laughed. “Or just put the whole team in front of one computer!”

They were trying to put down XP, but the Agile way is to experiment, learn, and improve. Rather than assume something won’t work, we try an experiment. Some experiments work; some don’t. Either way, we share what we learn.

That’s what happened with mob programming. Woody Zuill had a group teaching technique he used for coding dojos. His team at Hunter Industries was in a bind. They decided to try Woody’s group technique on real world work, and put the whole team in front of one computer.

It worked, and worked well. Woody and the team shared what they learned. And now mob programming is used around the world.

In some parts of the world, the term “mob programming” has unpleasant connotations, so people call it “ensemble programming” instead. Woody’s original name for it was “Whole Team programming.” But, he says, “I have always said, I don’t care what it’s called. Learning to work well as a team is worthwhile and I invite people to call it what they will.”1

1Quoted from a conversation with Woody Zuill on Twitter: https://twitter.com/WoodyZuill/status/1365473397665193984

How to Mob

Ally
Pair Programming

Mob programming is a variant of pair programming. Like pairing, it has a driver, who codes, and navigators, who provides direction. Unlike pairing, the whole team is present. While one person drives, the rest of the team navigates.

To be clear, #MobProgramming is merely a tiny evolutionary step beyond pair programming. There are no rules except the general guideline of “Let’s figure out how to turn up our ability to collaborate well”.2

2Another excerpt of the Twitter conversation with Woody Zuill: https://twitter.com/WoodyZuill/status/1365475181213347848

Woody Zuill

All the brilliant minds, in the same place, at the same time, working on the same thing.

You’re welcome to try any approach to mobbing that you like. Experiment and find what works for you. The central idea, as Woody says, is “All the brilliant minds, in the same place, at the same time, working on the same thing.”

Ally
Whole Team

To get started, try Woody Zuill’s approach. It starts with the whole team: everybody is present and ready to participate. Some people, such as on-site customers, may not be focused on the programming specifically, but they’re available to answer questions and they’re working on the same stories the programmers are.

On top of that base, layer on Llewellyn Falco’s strong-style pairing: all ideas must pass through somebody else’s fingers. [Falco 2014] When it’s your turn to drive, your job is to act as a very clever input device. How clever, exactly, depends on your familiarity with the code and editor. In some cases, a navigator might say, “now handle error cases,” and the driver will test-drive four tests and the accompanying production code without further prompting. In other cases, a navigator might say, “now extract the method,” and the driver will have to ask what to type. Customize the level of detail to each driver’s experience with the code and tools.

Finally, add a timer. Seven minutes is a good starting point. When the timer goes off, the driver stops. Another person takes over and work continues right where the previous driver left off. Rotate through everybody who’s interested in programming.

Why Mobbing Works

Mob programming is “easy mode” for collaboration.

Mob programming works because it’s “easy mode” for collaboration.

So much of Agile centers around communication and collaboration. It’s the secret sauce that makes Agile more effective than other approaches. And mobbing makes a lot of the Agile collaboration practices irrelevant. They’re simply not needed when you mob.

Stand-up meetings? Gone. Collective code ownership? Automatic. Team room? A no brainer. Task planning? Still useful, but kind of unnecessary.

All the brilliant minds, in the same place, at the same time, working on the same thing. That’s the Agile ideal. Mobbing makes it easy.

When I first heard about mobbing, I poo-poo'd it. “I get the same benefits from having a cross-functional team, a team room, pairing, frequent pair switching, and good collaboration,” I said. And I was right. Mobbing doesn’t get you anything you don’t already get on a good team. But it’s so easy. Getting people to pair and collaborate well is hard. Mobbing? It’s practically automatic.

The Mobbing Station

If you have a physical team room, it’s pretty easy to set up a place for mobbing. You need a projector or big-screen TV (or several), tables for people to sit at, and a development workstation. Make sure everybody can sit comfortably, has access to laptops and whiteboards (for looking stuff up and discussing ideas), and has enough room to switch drivers easily. Some teams provide a mobbing station as well as pairing stations so people can switch back and forth as desired.

If your team is remote, set up a videoconference and have the driver share their screen. When it’s time to switch drivers, the previous driver pushes their code to a temporary branch and the next driver pulls it. A script such as the one found at https://mob.sh/ can help with this process. You might find that you need to set a longer timer—perhaps ten minutes instead of seven—to reduce the amount of switching needed.

Making Mobbing Work

Mobbing is fun and easy, but it can still be tiring to work with the whole team day-in and day-out. Here are some things to consider:

Team dynamics
Allies
Alignment
Safety
Team Dynamics

Pay attention to the interactions between team members and make sure everybody’s voices are being heard. Establish working agreements, make it safe for people to express disagreement and concerns, and pay attention to team dynamics.3 If there’s someone who tends to dominate, remind them to let others speak; if there’s someone who has trouble speaking up, ask for their opinion.

3XXX update after Safety and Team Dynamics written.

When you first start mobbing, it’s worth spending a few minutes at the end of each day for a very short retrospective. Focus on what worked well and how to do more of it. Woody Zuill calls this “turn up the good.”

Energized work
Ally
Energized Work

Mobbing isn’t supposed to wear you out, but it can be overwhelming to be constantly surrounded by the whole team. Take care of yourself. You don’t need to be “on” at every moment.

One of the advantages of mobbing is that it’s not dependent on any one person. If you need a coffee break, or just want to clear your head, step away. Similarly, if you need to check your email or make a phone call, you can do that. The mob will continue on without you.

You don’t have to align your work schedules, either. People can drop in and out as needed.

Research
Ally
Spike Solutions

All changes to the production code go through the driver, but you can still use your computer when you aren’t driving. If you need to look up an API call, or have a side discussion about a design idea at the whiteboard, or create a spike solution, you can do that.

Strict navigator role

When you start mobbing, your team might have so many people shouting ideas that the driver has trouble understanding what to do. In this case, rather than having the whole team act as navigators, you can appoint one person to be navigator. This role rotates just like the driver role does. (I like to have the driver become the next navigator.) Their job is to condense the ideas of the mob into specific directions for the navigator. The driver only has to listen to the navigator, not the whole mob.

Non-programmers

Everybody in the mob can be a driver, even people who don’t know how to program. This can be an exciting opportunity for non-programmers to develop new skills. They may not become experts, but they’ll learn enough to contribute, and learning to drive could improve their ability to collaborate with programmers.

Remember to guide your driver at the level that they’re capable of following. For non-programmers, this may require providing direction at the level of specific keyboard shortcuts, menu items, and mouse clicks, at first.

But nobody is required to be a driver. Some people on the team may find that their time is better spent helping the mob in other ways. A tester and a domain expert might have a side conversation about customer examples related to the current story. A product manager may step out to conduct an interview with an important stakeholder. An interaction designer may work on user personas.

As with anything else, experiment with varying people’s level of involvement to find what works best for your team. But start by trying more involvement, rather than less. People often underestimate the power of working as a team. That conversation about customer examples, or stakeholder interview, or user persona work could be something that the mob learns from doing together.

Mini-mobs and part-time mobs

You don’t have to choose between pairing or mobbing. (Although I do recommend doing one or the other for all code you have to maintain.) You can mob part time and pair the rest of the time. Or you can form a “mini-mob” of three or four people while the rest of the team pairs.

Allies
Task Planning
Stand-Up Meetings

If you don’t mob full-time, be sure to keep other team coordination mechanisms, such as the task board and stand-up meetings, at least to start. The mobbing sessions may allow you to keep in sync without them, but make sure that’s true before removing them.

Questions

Is mobbing really more effective than working alone or in pairs?

There’s too many variables to say for sure. In my experience, pairing is more effective than working alone. Is mobbing even more effective than pairing? For teams with a good team room and great collaboration, maybe not. For other teams, it probably is. Try it and find out.

We’re having trouble remembering to switch drivers. What should we do?

If people are ignoring your timer, try using a tool such as Mobster (available at http://mobster.cc/). When the time is up, it blanks the screen so the driver has to stop.

Prerequisites

Mobbing requires permission from the team and management. Other than that, the only requirement is a comfortable work environment and appropriate mobbing setup.

Indicators

When your team mobs well:

  • The whole team directs their entire effort towards one story at a time, finishing work with minimal delays and wait time.

  • The team collaborates well and enjoys working together.

  • Internal quality improves.

  • When a tough problem arises, the mob solves it while the driver continues moving forward.

  • Decisions are made quickly and effectively.

Alternatives and Experiments

“All the brilliant minds, in the same place, at the same time, working on the same thing.” That’s the core idea of mob programming. Beyond that, the details are up to you. Start with the basic structure described here, then think about something to improve every day.

Ally
Pair Programming
Task Planning
Stand-Up Meetings

If mobbing isn’t a good fit, the best alternative is pair programming. Pairing doesn’t have the same automatic collaboration that mobbing does, though, so you’ll need to put more effort into collective ownership, task planning, and stand-up meetings.

Further Reading

XXX To consider:

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

AoAD2 Practice: Pair Programming

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Pair Programming

Audience
Developers, Whole Team

We help each other succeed.

Do you want somebody to watch over your shoulder all day? Do you want to waste half your time sitting in sullen silence watching somebody else code?

Of course not. Nobody does—especially not people who pair program.

Pair programming is one of the most controversial Agile ideas. Two people working at the same computer? It’s weird. It’s also extremely powerful and, once you get used to it, tons of fun. Most programmers I know who tried pairing for a month found that they preferred it to programming alone.

Ally
Collective Code Ownership

More importantly, pair programming is one of the most effective ways to achieve collective code ownership and truly collaborate on code as a team.

Why Pair?

There’s more to pairing than sharing knowledge. Pairing also improves the quality of your results. That’s because pair programming doubles your brainpower.

When you pair, one person is the driver. Their job is to code. The other person is the navigator. Their job is to think. As navigator, sometimes you think about what the driver is typing. (Don’t rush to point out missing semicolons, though. That’s annoying.) Sometimes you think about what comes next. Sometimes you think about how your work best fits into the overall design.

This arrangement leaves the driver free to work on the tactical challenges of creating rigorous, syntactically correct code without worrying about the big picture, and it gives the navigator the opportunity to consider strategic issues without being distracted by the details of coding. Together, the driver and navigator produce higher-quality work, more quickly, than either could produce on their own.1

1One study found that pairing takes about 15 percent more effort than one individual working alone, but produces results more quickly and with 15 percent fewer defects. [Cockburn and Williams 2001] Every team is different, so take these results with a grain of salt.

Pairing also reinforces good programming skills. Delivering practices take a lot of self-discipline. When pairing, you’ll have positive peer pressure to do the things that need to be done. You’ll also spread coding knowledge and tips throughout the team.

Surprisingly, you’ll also spend more time in flow—that highly productive state in which you’re totally focused on the code. It’s a different kind of flow than when you’re working alone, but it’s far more resilient to interruptions. To start with, you’ll discover that your office mates are far less likely to interrupt you when you’re working with someone. When they do, one member of the pair will handle the interruption while the other continues working. Further, you’ll find that background noise is less distracting: your conversation with your pairing partner will keep you focused.

If that isn’t enough, pairing really is a lot of fun. The added brainpower will help you get past roadblocks more easily. For the most part, you’ll be collaborating with smart, like-minded people. Plus, if your wrists get sore from typing, you can hand off the keyboard to your partner and continue to be productive.

Pairing Stations

To enjoy pair programming, a good workspace is essential, whether your team is in-person or remote. For in-person teams, make sure you have plenty of room for both people to sit side by side. Typical cubicles, with a monitor located in a corner, won’t work. They’re uncomfortable and require one person to sit behind the other, adding psychological as well as physical barriers to what’s meant to be peer collaboration.

You don’t need fancy furniture to make a good in-person pairing station. A simple table will do. It should be six feet long, so that two people can sit comfortably side by side, and at least four feet deep. Each table needs a high-powered development workstation. Plug in two keyboards and mice so each person can have a set. If people have a preferred mouse and keyboard, they can bring it with them. Make sure the USB ports are easily accessible in this case.

Splurge on large monitors so both people can see clearly. Be sure to respect differences in people’s vision needs, particularly with regards to font sizes and colors. Some teams set up three monitors, with the two outer monitors mirrored, so each person can see the code on a monitor in front of them, while using the middle display for additional material. If you do this, try installing a utility that makes the mouse wrap around the edges of your desktop. It will let both programmers reach the center screen easily.

If your team is remote, you’ll need a collaborative code editor and videoconference. Make sure you have multiple screens, so you can see each other and the code at the same time.

There are a variety of IDE add-ins and standalone tools for collaborative editing, such as Code Together, Tuple, Floobits, and Visual Studio’s Live Share. You can also share your screen in your videoconferencing tool, but a collaborative code editor will work better because it allows you to switch drivers seamlessly. If you have to use screen-sharing, though, you can hand off control by pushing the code to a temporary work-in-progress branch. Write a little script to automate the process.

Jeff Langr has a good rundown of remote code collaboration options in [Langr 2020].

How to Pair

I recommend pairing on all production code. Teams who pair frequently, but not exclusively, say that they find more defects in solo code. That matches pair programming studies, such as [Cockburn and Williams 2001], that find that pairs produce higher quality code. A good rule of thumb is to pair on anything that you need to maintain, which includes tests and automation.

When you start working on a task, ask another programmer to work with you. If someone else asks for help, make yourself available. Managers should never assign partners: pairs are fluid, forming naturally and shifting throughout the day. Over the course of the week, pair with every developer on the team. This will improve team cohesion and spread skills and knowledge throughout the team.

Get a fresh perspective by switching partners.

When you need a fresh perspective, switch partners. I usually switch when I’m feeling frustrated or stuck. Have one person stay on task and bring the new partner up to speed. Often, even explaining a problem to someone new will help you resolve it.

It’s a good idea to switch partners several times per day even if you don’t feel stuck. This will help keep everyone informed and moving quickly. I switch whenever I finish a task. If I’m working on a big task, I switch within four hours.

Some teams switch partners at strictly defined intervals. [Belshee 2005] reports interesting results from switching every 90 minutes. While this could be a great way to get in the habit of switching pairs, make sure everybody is willing to try it.

When you sit down to pair, make sure you’re physically comfortable. If you’re colocated, position your chairs side by side, allowing for each other’s personal space, and make sure the monitor is clearly visible. When you’re driving, place the keyboard directly in front of you. Keep an eye out for this one—for some reason, people new to pairing tend to contort themselves to reach the keyboard and mouse rather than moving them closer.

Expect to feel clumsy and fumble-fingered, at first, when it’s your turn to drive. You may feel that your navigator sees ideas and problems much more quickly than you do. They do—navigators have more time to think than drivers do. The situation will be reversed when you navigate. Pairing will feel natural in time.

Ally
Test-Driven Development

Pairs produce code through conversation. As you drive or navigate, think out loud. Take small steps—test-driven development works well—and talk about your assumptions, short-term goals, general direction, and any relevant history of the feature or project. If you’re confused about something, ask questions. The discussion may enlighten your partner as much as you.

When a pair goes dark—talks less, lowers their voices, or doesn’t switch off with other pairs—it’s often a sign of technical difficulty.

As you pair, switch the driver and navigator roles frequently—at least every half hour, and possibly every few minutes. If you’re navigating and find yourself telling the driver which keys to press, ask for the keyboard. If you’re driving and need a break, pass the keyboard off to your navigator.

Ally
Energized Work

Expect to feel tired at the end of the day. Pairs typically feel that they have worked harder and accomplished more together than when working alone. Practice energized work to maintain your ability to pair every day.

Effective Navigating

When navigating, you may feel like you want to step in and take the keyboard away from your partner. Be patient; your driver will often communicate an idea with both words and code. They’ll make typos and little mistakes—give them time to correct themself. Use your extra time to think about the bigger picture. What other tests do you need to write? How does this code fit into the rest of the system? Is there duplication you want to remove? Can the code be more clear? Can the overall design be better? Is there friction that should be polished away?

Pay attention to your driver’s needs, too. Somebody’s who’s unfamiliar with the IDE or codebase may need specific guidance. But resist the urge to micromanage. Give them room to figure out things on their own.

As navigator, your role is to help your driver be more productive. Think about what’s going to happen next and be prepared with suggestions. When I’m navigating, I like to keep an index card in front of me. Rather than interrupting the driver when I think of an something, I write my ideas on the index card and wait for a break in the action to bring them up. At the end of the pairing session, I tear up the card and throw it away.

Ally
Spike Solutions

Similarly, when a question arises, take a moment to look up the answer while the driver continues to work. Some teams keep spare laptops on hand for this purpose. If you need more than a few minutes, pause coding to research the solution together. Sometimes the best way to do this is to split up, pursue parallel lines of inquiry, and come back together to share what you’ve learned. Spike solutions are a particularly powerful approach.

Teaching Through Pairing

Pair programming works best when it’s a peer collaboration, but sometimes you’ll be in a situation where you know the code and your partner doesn’t.

The best developers help everyone work quickly and well.

When this happens, remember to be patient. Teaching your pair partner how the code works slows you down, but the goal isn’t to maximize your performance... it’s to maximize the team’s performance. A good developer works quickly and well, but the best developers help everyone do so.

When you use pairing to teach someone about the code, start by letting them drive. That will allow them to control the pace. As you guide them, refrain from telling them exactly what to do. Instead, provide the big-picture direction—maybe even start with a whiteboard diagram—and give them space to figure out the details.

For example, when making changes to a service, don’t say, “We need to change SuperMailClient. Click src... now click infrastructure... now click rest...” Instead, provide context and direction: “Our task is to replace our transactional mail vendor, SuperMail, with BetterMail. They both provide REST APIs, so all we need to do is change our SuperMail wrapper to use BetterMail instead. (Sketches the project structure on the whiteboard.) All our REST clients are in the infrastructure/rest folder and each service has its own wrapper.” Then let your partner navigate through the project files and find the file to work on themselves.

Once the person you’re teaching can find their way around, switch roles. Ask them to navigate and tell you what needs to be done next. Be careful, though: when you’re driving, it’s tempting to rush ahead and just do what you know needs to be done. For it to work as a teaching technique, you have to suppress that desire and let your partner set the pace.

Challenges

Pairing can feel awkward or unpleasant at first. These feelings are natural and typically go away after a month or two. Here are some common challenges and how to resolve them:

Comfort

It bears repeating: pairing is no fun if you’re uncomfortable. When you sit down to pair, adjust your position and equipment so you can sit comfortably. Clear debris off the desk and make sure there’s room for your legs, feet, and knees. Check in with your partner about font sizes and monitor position. If you’re pairing remotely, take time before you begin to make sure all your tooling is set up and frictionless.

Some people (like me) need a lot of personal space. Others like to get up close and personal. When you start to pair, discuss your personal space needs and ask about your partner’s.

Similarly, while it goes without saying that personal hygiene is essential, remember that strong flavors such as coffee, garlic, onions, and spicy foods can lead to foul breath.

Introversion and social anxiety

Introverts often worry that pairing won’t work for them, but—as an introvert myself—I haven’t found that to be true in practice. Although pairing can tiring, it’s also very focused on ideas and results. There’s no need to engage in small talk, and you’re typically working with people who you know well and respect. It’s a very productive, very cerebral collaboration, and that can be a lot of fun. Most introverts I’ve met who have tried pairing have liked it, once they got past the initial learning curve.

Ally
Alignment

Of course, people don’t divide neatly into predefined personality trait boxes. Pairing—and Agile in general—can be difficult for people with social anxiety. If you think pairing might be difficult for you or someone on your team, talk about ways to make pairing more comfortable, or if there are other ways your team can achieve collective code ownership. The alignment session is a good time for this conversation.

Mismatched skill levels

Although pairing works best as a peer collaboration, sometimes people with different skill sets will work together. In this situation, it’s important to restore the peer balance. Highlight the skills that each person is bringing to the table. Even if one person needs to teach the other about the code, treat it as a lack of knowledge that’s easily rectified, not a lack of ability on the part of the learner, or sign of superiority on the part of the teacher.

Communication style

New drivers sometimes have difficulty involving their partners; they can take over the keyboard and shut down communication. To practice communicating and switching roles while pairing, consider ping-pong pairing. In this exercise, one person writes a test. The other person makes it pass and writes a new test. Then the first person makes it pass and repeats the process by writing another test.

Another approach to try is strong-style pairing. In strong-style pairing, invented by Llewellyn Falco, all ideas must pass through the other person’s fingers. [Falco 2014] So if you come up with an idea, you have to pass the keyboard to the other person and tell them how to implement it. Then when they come up with an idea, they pass the keyboard back to you and tell you what to do. Even if this isn’t something you want to do all the time, it’s a great way to practice communicating with your partner.

Ally
Safety

The flip side of too little communication is too much communication—or rather, too much blunt communication. Frank criticism of code and design is valuable, but it may be difficult to appreciate at first. Different people have different thresholds, so pay attention to how your partner receives your comments. Try transforming declarations (such as “This method is too long”) into questions or suggestions (“Could we make this method shorter?” or “Should we extract this code block into a new method?”). Adopt an attitude of collaborative problem solving.2

2XXX Double-check this after Safety practice is done

Tools and keybindings
Ally
Alignment

Even if you don’t fall victim to the endless vi vs. emacs editor war, you may find your coworkers’ tool preferences annoying. Try to standardize on a particular toolset. Some teams even create a standard image and check it into version control. When you discuss working agreements during your alignment discussion, discuss these issues as well.

Keyboards and mice can be another source of contention. If they are, you don’t have to standardize. People with strong input device preferences can take their devices with them when they switch pairing stations. Just make sure they have easily-accessible USB ports.

Questions

Isn’t it wasteful to have two people do the work of one?

In pair programming, two people aren’t really doing the work of one. Although only one keyboard is in use at a time, there’s more to programming than typing. In pair programming, one person is programming and the other is thinking ahead, anticipating problems, and strategizing.

How can I convince my team or organization to try pair programming?

Ask permission to try it as an experiment. Set aside a month in which everyone pairs on all production code. Be sure to keep going for the entire month, as pair programming may be uncomfortable for the first few weeks.

Don’t just ask permission of management; get the consent of your fellow team members, too. They don’t have to love the idea, but do make sure they’re not opposed to it.

Do we really have to pair program all the time?

This is a decision that your whole team should make together. Before you decide, try pairing on all production code (and everything else you need to maintain) for a month. You may enjoy it more than you expect.

Even if you decide that all production code needs to be paired, you will still produce code that you don’t need to maintain. Spike solutions are one example. These often benefit from working independently.

If you’re bored while pairing, it’s an indication of a design flaw.

Some production tasks are so repetitive that they don’t require the extra brainpower a pair provides. Before abandoning pairing, however, consider why your design requires so much repetition. It’s a common indication of a design flaw. Use the navigator’s extra time to think about design improvements and consider discussing it with your whole team.

How can I concentrate with someone talking to me?

When you navigate, you shouldn’t have too much trouble staying several steps ahead of your driver. If you do have trouble, ask your driver to think out loud so you can understand their thought process, or ask to drive so you can control the pace.

As driver, you may sometimes find that you’re having trouble solving a problem. Let your navigator know—they may have a suggestion that will help you through the roadblock. At other times, you may just need a few moments of silence to think through the problem. It’s okay to say so.

Allies
Test-Driven Development
Spike Solutions

If you find yourself in this situation a lot, you may be taking steps that are too large. Use test-driven development and take very small steps. Rely on your navigator to keep track of what you still need to do (tell them if you have an idea; they’ll write it down) and focus only on the few lines of code needed to make the next test pass.

If you are working with a technology you don’t completely understand, consider taking a few minutes to work on a spike solution. You and your partner can work on this together or separately.

What if we have an odd number of programmers?

A programmer flying solo can do productive tasks that don’t involve production code. They can research new technologies or learn more about a technology the team is using. They can pair with a customer or tester to review recent changes, polish the application, or do exploratory testing. They can take care of administrative tasks for the team, such as responding to team emails.

Ally
Zero Friction

Alternatively, a solo programmer may wish to improve the team’s capacity. They can research solutions to friction the team is experiencing, such as slow builds, flaky tests, or unreliable deployment pipelines. They can review the overall design—either to improve their own understanding or to come up with ideas for improving problem areas. If a large refactoring is partially complete, the team may wish to authorize a conscientious programmer to finish those refactorings.

If you run out of useful solo tasks, you can relax the “no production code” rule or use mob programming to form a “mini-mob” of three people.

Prerequisites

Pairing requires a comfortable work environment. Most offices and cubicles just aren’t set up that way. Before trying pairing full-time, adjust your physical space. If your team is remote, get your tooling in place.

Make sure everyone wants to participate before you try pairing. Pairing is a big change to programmers’ work styles and you may encounter resistance. I usually work around this by asking people to try it for a month or two, then decide. If that doesn’t work, you can try pairing part-time, or with just the people who are interested, although I find that pairing works best when the whole team does it full-time.

Ally
Mob Programming

Mob programming tends to be less intimidating than pairing. If people don’t want to try pairing, see if they’d like to try mobbing instead.

Indicators

When your team pairs well:

  • You’re focused and engaged throughout the day.

  • You enjoy the camaraderie of working with your teammates.

  • At the end of the day, you feel tired and satisfied.

  • For small interruptions, one person deals with the problem while the other continues working. Afterwards, they slide back into the flow of work immediately.

  • Internal quality improves.

  • Knowledge and coding tips travel quickly through the team, raising everyone’s level of competence.

  • New team members integrate into the team quickly and easily.

Alternatives and Experiments

Pairing is a very powerful tool. I’m not aware of any other technique, other than mobbing, that’s as effective. Give pairing (or mobbing) a real try before experimenting with alternatives.

When you look at alternatives, don’t make the mistake of thinking that pairing is just a fancy type of code review. To truly replace pairing, you need to replace all these benefits:

Ally
Collective Code Ownership

Code quality. Because pairing brings so many perspectives to the code, and results in so much conversation about the code, it reduces defects and improves design quality. The frequent pair switching shares knowledge amongst team members, which enhances collective code ownership. By having people work together, it helps people focus, supports self-discipline, and reduces distractions. It does all this without sacrificing productivity.

Formal code reviews can also reduce defects, improve quality, and support self-discipline. In a sense, pairing is just continuous code review. Code reviews don’t share knowledge as thoroughly as pairing, though, so if you’re using collective code ownership, you probably need to supplement code reviews with additional design discussions.

Flow. Pairing’s benefits to flow are more subtle. Because it focuses two people on the same problem, pairing is sort of like having a backup brain. If one person gets distracted, the other person can “reboot” their attention and get them back on track quickly. It’s also easier to ignore the ever-present distractions provided by smartphones, email, instant messaging, and the other demands on our attention. In an environment without pairing, you’ll need another way to help people stay focused.

Collaboration. Pairing’s resilience to distractions makes intra-team collaboration easier. Ideally, in a team, when one person gets stuck on a question that another team member can answer, you want them to ask for help rather than spinning their wheels. If you’re pairing, there’s very little cost to answering a question, because your pairing partner keeps working. It makes sense to ask for help any time you need help.

If you aren’t pairing, interruptions are much more costly. According to [DeMarco and Lister 2013],3 it takes a programmer 15 minutes or more to get back into flow after an interruption. The calculus of interruptions changes: do you ask a question and cost somebody on the team at least fifteen minutes of work? Or do you continue to struggle and hope you get the right answer? Personally, I'd keep the interruptions, because it benefits team cohesion to have a culture of helping each other. But that can be frustrating for developers.

3XXX check new edition for this reference; add page number

Noise cancellation with situational awareness. Pair programming has another benefit that’s even less obvious. In a physical team room, pairing creates a low buzz of conversation. You might expect this to be distracting, but it actually recedes into the background as your brain focuses on your interaction with your partner. But the background conversation still enhances your situational awareness. It’s the cocktail-party effect: When somebody says something important to you, your subconscious picks it out of the background and brings it to your conscious attention.

Allies
Team Room
Informative Workspace

In contrast, for teams that don’t pair, side conversations are distracting and can make it hard to concentrate. In that situation, independent offices or cubicles can be better. But now you won’t be able to take advantage of the team room: you won’t be able to see what others are doing as easily and you won’t have the situational awareness provided by an informative workspace.

You could keep the team room and have everyone wear noise-cancelling headphones instead, or encourage people to take side conversations to another room. This will bring back some of your situational awareness, but you won’t get the advantages of the cocktail-party effect.

In other words, pairing has a lot of unobvious benefits that reinforce other agile practices. Although it’s definitely weird, and can be a lot to ask, it’s worth putting in the effort to give it a real try. Don’t just dismiss it out of hand. If pairing isn’t a good fit, try mobbing instead.

Further Reading

Pair Programming Illuminated [Williams 2002] discusses pair programming in depth.

“The Costs and Benefits of Pair Programming” [Cockburn and Williams 2001] reports on Laurie Williams‘ initial study of pair programming.

“Promiscuous Pairing and Beginner‘s Mind: Embrace Inexperience” [Belshee 2005] is an intriguing look at the benefits of switching pairs at strict intervals.

“Adventures in Promiscuous Pairing: Seeking Beginner‘s Mind” [Lacey 2006] explores the costs and challenges of promiscuous pairing. It‘s a must-read if you plan to try Belshee‘s approach.

XXX Peer Reviews in Software: A Practical Guide (Wiegers)?

XXX https://martinfowler.com/articles/on-pair-programming.html

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.