AoAD2 Practice: Refactoring

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Refactoring

Audience
Programmers

We revise and improve the design of existing code.

Code rots. That’s what everybody says: entropy is inevitable, and chaos eventually turns your beautifully imagined, well-designed code into a big mess of spaghetti.

I used to think that, too, before I learned to refactor. Now I have a ten-year-old production codebase that’s better today than it was when I first created it. I’d hate to go back: every year, it’s so much better than it was the year before.

Refactoring isn’t rewriting.

Refactoring makes this possible. It’s the process of changing the design of your code without changing its behavior. What it does stays the same, but how it does it changes. Despite popular misuse of the term, refactoring isn’t rewriting. Nor is it any arbitrary change. Refactoring is a careful, step-by-step approach to incrementally improving the design of your code.

Refactorings are also reversible: there’s no one right answer, so sometimes you’ll refactor in one direction, and sometimes you’ll refactor in the other. Just as you can change the expression “x²-1” to “(x+1)(x-1)” and back, you can change the design of your code—and once you can do that, you can keep entropy at bay.

How to Refactor

Allies
Test-Driven Development
Reflective Design
Slack

Technically, you can refactor at any time, but unless your IDE has guaranteed-safe refactorings, it’s best to do it when you have a good suite of tests that are all passing. You’ll typically refactor during the “Refactor” step of the test-driven development loop, but you’ll also refactor to make a change easier or to clean up code.

When you refactor, you’ll proceed in a series of very small transformations. (Confusingly, each transformation is also called a refactoring.) Each refactoring is like making a turn on a Rubik’s cube. To achieve anything significant, you have to string together several individual refactorings, just as you have to string together several turns to solve the cube.

To refactor well, you need to work in a series of controlled steps.

The fact that refactoring is a sequence of small transformations is sometimes lost on people new to refactoring. You don’t just change the design of your code: to refactor well, you need to make that change in a series of controlled steps. Each step should only take a few moments, and your tests should pass after each one.

There are a wide variety of individual refactorings. The definitive guide is Martin Fowler’s eponymous book, Refactoring: Improving the Design of Existing Code. [Fowler 2018] It contains an in-depth catalog of refactorings, and is well worth studying. I learned more about good code and design from reading that book than from any other source.

That said, you don’t need to memorize all the individual refactorings. Instead, try to learn the mindset behind them. The automated refactorings in your IDE will help you get started, but there’s many more options available to you. The trick is to break down the change you want to make into small steps that only change the design of your code, not its behavior.

Refactoring in Action

To illustrate this point, I’ll continue the example started in “A TDD Example” on p.XX. This is a small example, for space reasons, but it still illustrates how a bigger change can be broken down into individual refactorings. Each refactoring is just a matter of seconds.

To follow along with this example, clone the git repository at https://github.com/jamesshore/livestream, check out the 2020-05-05-end tag, and modify the src/rot-13.js file. See README.md for instructions about how to run the build.)

At the end of the TDD example, we had a JavaScript module that performed ROT-13 encoding:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

function codeFor(letter) {
  return letter.charCodeAt(0);
}

The code worked, and was decent quality, but it was overly verbose. It used character codes for determining ranges, but JavaScript allows you to compare letters directly. We can simplify the code by removing codeFor() and having isBetween() do a direct comparison, like this:

function isBetween(letter, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

Although that change could be made all at once, making big changes in a real-world application will introduce bugs and can get you into a state that’s hard to get out of. (Been there, done that. In a public demonstration of refactoring. Youch.) As with TDD, the better you understand how to refactor, the smaller steps you’re able to make, and the faster you’ll go. So I’ll demonstrate the refactoring step by safe step.

To start with, isBetween() takes charCode, not letter. I needed to modify its caller, transformLetter(), to pass in a letter. But transformLetter() didn’t have a letter either. Even transform() didn’t have a letter. So that was the first thing to introduce:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(charCode);
  }
  return result;
}

function transformLetter(charCode) ...

This was a do-nothing statement: I introduced a variable, but nothing used it, so I expected the tests to pass. I ran them, and they did.

Although the letter variable wasn’t used, introducing it gave me the ability to pass letter into transformLetter. That was my next step.

Ally
Zero Friction

Notice how small these steps were. From experience, I knew that manually refactoring function signatures often goes wrong, so I wanted to take it slow. Such small steps require a zero-friction build.

exports.transform = function(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  if (isBetween(charCode, "a", "m") || isBetween(charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(charCode, "n", "z") || isBetween(charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

The tests passed again. Now that I had letter in transformLetter(), I could pass it through to isBetween():

function transformLetter(letter, charCode) {
  if (isBetween(letter, charCode, "a", "m") ||
      isBetween(letter, charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(letter, charCode, "n", "z") ||
             isBetween(letter, charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return charCode >= codeFor(firstLetter) && charCode <= codeFor(lastLetter);
}

(Tests passed.) And now that isBetween() had letter, I could finally modify isBetween to use it:

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

(Tests passed.) The codeFor() method was no longer in use, so I deleted it.

Ally
Slack

(Tests passed.) I had accomplished what I originally set out to do, but now that I saw what the code looked like, I could see more opportunities to simplify. This is common when refactoring: cleaning up the code will make more cleanups visible. Deciding whether to pursue those additional cleanups is a question of judgment and how much slack you have.

This is what the code looked like:

exports.transform = function(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  if (isBetween(letter, charCode, "a", "m") ||
      isBetween(letter, charCode, "A", "M")) {
    charCode += 13;
  } else if (isBetween(letter, charCode, "n", "z") ||
             isBetween(letter, charCode, "N", "Z")) {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

function isBetween(letter, charCode, firstLetter, lastLetter) {
  return letter >= firstLetter && letter <= lastLetter;
}

In this case, I had plenty of slack, so I decided to keep refactoring. The isBetween() function didn’t seem like it was adding any value, so I inlined it. I was able to do this in a single, bigger step because I used my editor’s automatic “Inline Function” refactoring.

function transformLetter(letter, charCode) {
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M")  {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) Passing in charCode seemed redundant, so I copied the charCode logic from transform into transformLetter():

function transformLetter(letter, charCode) {
  charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) And then removed the unneeded charCode parameter:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  let result = "";
  for (let i = 0; i < input.length; i++) {
    let letter = input[i];
    let charCode = input.charCodeAt(i);
    result += transformLetter(letter, charCode);
  }
  return result;
};

function transformLetter(letter, charCode) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Tests passed.) That was a nice simplification, but I saw an opportunity to make it even better. Rather than manually looping over the string, I realized I could use a regular expression to call transformLetter() instead:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  return input.replace(/[A-Za-z]/g, transformLetter);
};

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) I thought that was as good as it could get, at first. But the /[A-Za-z]/ in the regex bothered me. I had included it to make the code more readable, but matching every character with /./ would have worked just as well. The regex wasn’t really doing anything.

Then it hit me: with the regex ensuring that only letters were being passed to transformLetter(), I could simplify the if statements. I wasn’t 100% sure about this, so I started slow:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter >= "a" && letter <= "m" || letter >= "A" && letter <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

The tests failed! I had forgotten that, in ASCII, upper-case “Z” comes before lower-case “a”. I needed to normalize the letter first:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter <= "m" || letter >= "A" && letter.toUpperCase() <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

That fixed it. Now I felt safe removing the second half of the if statement:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
  } else if (letter >= "n" && letter <= "z" || letter >= "N" && letter <= "Z") {
    charCode -= 13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) The code was good, but the mutable charCode variable was bothering me. I prefer a more functional style. Rather than modifying the charCode variable directly, I decided to try storing the rotation amount instead.

First I introduced the new variable:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
    rotation = 13;
  } else {
    charCode -= 13;
    rotation = -13;
  }
  return String.fromCharCode(charCode);
}

(Test passed.) Then used it in place of charCode:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    charCode += 13;
    rotation = 13;
  } else {
    charCode -= 13;
    rotation = -13;
  }
  return String.fromCharCode(charCode + rotation);
}

(Test passed.) And inlined charCode using my editor’s automated refactoring:

function transformLetter(letter) {
  let charCode = letter.charCodeAt(0);
  let rotation;
  if (letter.toUpperCase() <= "M") {
    rotation = 13;
  } else {
    rotation = -13;
  }
  return String.fromCharCode(letter.charCodeAt(0) + rotation);
}

(Test passed.) Finally, I converted the if statement to a constant expression. In my editor, this was two automated refactorings: an automated conversion of if to ?:, and an automated joining of declaration and assignment. Then I manually changed let to const. The tests passed after each step, and the completed code looked like this:

export function transform(input) {
  if (input === undefined || typeof input !== "string") {
    throw new Error("Expected string parameter");
  }

  return input.replace(/[A-Za-z]/g, transformLetter);
};

function transformLetter(letter) {
  const rotation = letter.toUpperCase() <= "M" ? 13 : -13;
  return String.fromCharCode(letter.charCodeAt(0) + rotation);
}

This is a nice improvement over the original code. I could have made it more compact, but that would have sacrificed readability, so I was happy with it as it was. Some people might argue that the ternary expression was a step too far already.

And that’s what it looks like to refactor, step by safe step. Although this is a small example, it’s an accurate reflection of real-world refactoring. In larger codebases, gradual changes like this are the basis for big improvements.

Small steps are important, too. This example is simple enough that you could convert it all in one or two big steps, but if you learn how to take small steps on small problems like this, you’ll be able to do so on large problems, too, and that’s how you successfully refactor a large codebase.

To see an example of incremental refactoring applied to a larger problem, see Emily Bache’s superb walkthrough of the Gilded Rose kata. [Bache 2018]

Allies
Incremental Design
Reflective Design

Breaking a big design change into a sequence of small refactorings enables you to make dramatic design changes without risk. You can even make big changes incrementally, fixing part of the design one day and another part of it another day. This is a necessary part of using your slack to make big changes, and the key to successful Agile design.

Questions

How often should we refactor?

Allies
Test-Driven Development
Slack

Constantly. Perform little refactorings as you use TDD and bigger refactorings as part of your slack. Every week, your design should be better than it was the week before.

Isn’t refactoring rework? Shouldn’t we design our code correctly from the beginning?

If it were possible to design your code perfectly from the beginning, then refactoring would be rework. However, as everybody who’s worked with large systems knows, mistakes always creep in. Even if they didn’t, the needs of your software change over time, and your design has to be updated to match. Refactoring gives you the ability to constantly improve.

What about our database? That’s what really needs improvement.

You can refactor databases, too. Just as with normal refactorings, the trick is to proceed in small, behavior-preserving steps. Refactoring Databases: Evolutionary Database Design [Ambler and Sadalage 2006] describes how. However, data migration can take a long time, which requires special deployment considerations, as described in “Continuous Deployment” on p.XX.1

1XXX replace with direct reference when Continuous Deployment is written.

How can we make large design changes without conflicting with other team members?

Ally
Continuous Integration

Communicate regularly and use continuous integration. Before taking on a refactoring that will touch a bunch of code, integrate your existing code and let people know what you’re about to do. Sometimes other people can reduce the chance of integration conflicts by mirroring any big rename refactorings you’re planning on doing.

I can’t refactor without breaking a lot of tests! What am I doing wrong?

Your tests should check the behavior of your code, not the implementation, and refactoring should change implementation, but not behavior. So if you’re doing everything correctly, the tests shouldn’t break when you refactor.

Some refactorings will change function or method signatures, but that only changes the interface, not the underlying behavior. Refactoring an interface requires changing all callers, which includes your tests, but your tests shouldn’t require any special changes.

If your tests often break when you refactor, or if your tests make interface changes difficult, it could be due to inappropriate use of test doubles (such as mock objects). Look at ways to improve your test design. One option is to use sociable tests instead of isolated tests, as “Write Sociable Tests” on p.XX discusses. If that doesn’t help, ask a mentor for guidance.

Prerequisites

Allies
Test-Driven Development
Zero Friction
Collective Code Ownership
Continuous Integration

Refactoring requires good tests and a zero-friction build. Without tests, refactoring is risky, because you can’t easily tell whether your changes have accidentally broken something. (Some IDEs provide a few guaranteed-safe refactorings, but other refactorings still require tests.) Without a zero-friction build, feedback is too slow to allow small steps. It’s still technically possible to refactor, but it’s slow and painful.

Refactoring also requires collective code ownership. Any significant design changes will require that you touch many parts of the code. Collective code ownership gives you the permission you need to do so. Similarly, refactoring requires continuous integration. Without it, each integration will be a nightmare of conflicting changes.

Published interfaces—interfaces used by code that’s outside of your team’s control—require careful management. You’ll need to coordinate with everyone who uses the published interface. For this reason, it’s often best to avoid refactoring published interfaces.

Some programming environments, particularly “low code” or “no code” environments, can make refactoring difficult. So can highly-dynamic programming styles, such as monkey-patching (code that redefines existing interfaces) or string-based reflection. Refactoring may not be worth the cost in these situations—but be sure to consider the increased costs of change and decreased longevity that comes with choosing not to refactor.

It’s possible, although not common, to spend too much time refactoring. You don’t need to refactor code that’s unrelated to your current work. Similarly, balance your need to finish stories with the need to have good code. As long as the code is better than it was when you started, you’re doing enough. In particular, if you think the code could be better, but you’re not sure how to improve it, it’s okay to leave it for someone else to improve later. That’s one of the great things about collective ownership: someone will improve it later.

Indicators

When you use refactoring as an everyday part of your toolkit:

  • The code constantly improves.

  • You make significant design changes safely and confidently.

  • Every week, the code is at least slightly better than it was the week before.

Alternatives and Experiments

There are no real alternatives to refactoring. No matter how carefully you design your code, it will eventually get out of sync with the needs of your application. Without refactoring, that disconnect will overwhelm you, leaving you to choose between rewriting the software, at great expense and risk, or abandoning it entirely.

However, there are always opportunities to learn how to refactor better. That typically involves figuring out how to take smaller, safer, more reliable steps. Keep practicing. I’ve been at it for twenty years and I’m still learning new tricks.

Further Reading

Refactoring: Improving the Design of Existing Code [Fowler 2018] is the definitive reference for refactoring. It’s also a great read. Buy it.

Refactoring to Patterns [Kerievsky 2004b] takes Fowler’s work one step further, showing how refactorings can string together to achieve significant design changes. It’s a good way to learn more about how to use individual refactorings to achieve big results.

Refactoring Databases: Evolutionary Database Design [Ambler and Sadalage 2006] shows how refactoring can apply to database schemas.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.