AoAD2 Practice: Feature Flags

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Revised: July 18, 2021

Feature Flags

Audience
Programmers

We deploy and release independently.

For many teams, releasing their software is the same as deploying their software. They deploy a branch of their code repository into production, and everything in that branch is released. If there’s anything they don’t want to release, they store it in a separate branch.

Allies
Continuous Integration
Continuous Deployment

That doesn’t work for teams using continuous integration and deployment. Other than short-lived development branches, they only have one branch: their integration branch. There’s nowhere for them to hide unfinished work.

Feature flags, also known as feature toggles, solve this problem. They hide code programmatically, rather than using repository branches. This allows teams to deploy unfinished code without releasing it.

Feature flags can be programmed in a variety of ways. Some can be controlled at runtime, allowing people to release new features and capabilities without redeploying the software. This puts releases in the hands of business stakeholders, rather than programmers. They can even be set up to release the software in waves, or to limit releases to certain types of users.

Keystones

Strictly speaking, the simplest type of feature flag isn’t a feature flag at all. Kent Beck calls it a “Keystone.” [Beck 2004] (p. 69) It’s easy: when working on something new, wire up the UI last. That’s the keystone. Until the keystone is in place—until the UI is wired up—nobody will know the new code exists, because they won’t have any way to access it.

For example, when I migrated a website to use a different authentication service, I started by implementing an infrastructure wrapper for the new service. I was able to do most of that work without wiring it up to the login button. Until I did, users were unaware of the change, because the login button still used the old authentication infrastructure.

Allies
Test-Driven Development
Fast Reliable Tests

This does raise the question: if you can’t see your changes, how do you test them? The answer is test-driven development and narrow tests. Test-driven development allows you to check your work without seeing it run. Narrow tests target specific functions without requiring them to be hooked up to the rest of your application.

Eventually, of course, you’ll want to see the code run, either to fine-tune the user interface (which can be difficult to test-drive), for customer review, or just to double-check your work. TDD isn’t perfect, after all.

Design your new code to be “wired up” with a single line. When you want to see it run, add that line. If you need to integrate before you’re ready to release, comment that line out. When you’re ready to release, write the appropriate test and uncomment the line one final time.

Keystones don’t have to involve a user interface. Anything that hides your work from customers can be used as a keystone. For example, one team used continuous deployment for a rewrite of their website. They deployed the new site to a real production server, but the server didn’t receive any production traffic. Nobody outside the company could see the new site until they switched production traffic from the old server to the new one.

Keystones are my preferred approach to hiding incomplete work.

Keystones are my preferred approach to hiding incomplete work. They’re simple, straightforward, and don’t require any special maintenance or design work.

Feature Flags

Feature flags are just like keystones, except they use code to control visibility, not a comment. Usually, it’s a simple if statement.

To continue the authentication example, remember that I programmed my new authentication infrastructure without wiring it up to the login button. Before I could wire it up, I needed to test it in production, because there were complicated interactions between the third-party service and my systems. But I didn’t want my users to use the new login before I tested it.

I solved this dilemma with a feature flag. My users saw the old login; I saw the new one. The code worked like this (Node.js):

if (siteContext.useAuth0ForAuthentication()) {
  // new Auth0 HTML
}
else {
  // old Persona HTML
}

As part of the change, I had to implement a new email validation page. It wasn’t exposed to existing users, but it was still possible for people to manually type in the URL, so I also used the feature flag to redirect them away:

httpGet(siteContext) {
  if (!siteContext.useAuth0ForAuthentication()) return redirectToAccountPage();
  ⋮
}

Feature flags are real code. They need the same attention to quality as the rest of your code. For example, the email validation page had this test:

it("redirects to account page if Auth0 feature flag is off", function() {
  const siteContext = createContext({ auth0: false });
  const response = httpGet(siteContext);
  assertRedirects(response, "/v3/account"));
});

Be sure to remove feature flags after they’re no longer needed. This can be easy to forget, which is one of the reasons I prefer keystones to feature flags. To help you remember, you can add a reminder to your team calendar or a “remove flag” story to your team’s plan. Some teams program their flag code to log an alert or fail tests after its expiration date has passed.

How does your code know when the flag is enabled? In other words, where do you implement your equivalent of useAuth0ForAuthentication()? You have several options.

Application configuration

Application configuration is the most straightforward way to control your feature flags. Your configuration code can pull the state of the flag from a constant, an environment variable, a database, or whatever you like. A constant is simplest, so it’s my first choice, but an environment variable or database will allow you to enable or disable the flag on a machine-by-machine basis, which allows you to perform incremental releases.

User configuration

If you want to enable your flag based on who’s logged in, make it a privilege attached to your user or account abstraction. For example, user.privileges.logsInWithAuth0(). You can use it perform incremental releases based on subsets of users, and selectively release features for the purpose of testing ideas.

Feature flags are easy to implement, but they can be complicated to manage. Once you start getting into incremental releases and user segmentation, it’s worth looking into the many tools and services for managing them.

A feature flag is a way of temporarily hiding new features.

Don’t confuse feature flags with user access control. Although feature flags can be used to hide a feature from a user, they’re a way of temporarily hiding new features users would otherwise have access to. User access control, in contrast, is for hiding features users should never have access to. They might both be implemented with user privileges, but they should be managed separately.

For example, if you create a new white-labelling feature for your enterprise customers, you might use a feature flag to gradually roll it out to those customers. However, you would also implement a user privilege that restricted access to enterprise customers. That way, when the feature flag code is removed, enterprise customers will continue to be the only people with access, and there’s no risk of accidentally enabling the feature flag for the wrong users.

Secrets

In some cases, you’ll want to enable a flag on a case-by-case basis, but you won’t be able to attach that privilege to a user. For example, during my authentication transition, I needed to enable the new login button before I was actually logged in.

For these cases, you can use a secret to enable the flag. In client-based applications, the secret can take the form of a special file in the file system. For server-based applications, a cookie or other request header will work. That’s what I did for my authentication flag. I programmed the code to look for a secret cookie that could only be set by logging in as an administrator.

Secret-based flags are riskier than configuration-based flags. If the secret gets out, anybody can enable the feature. They’re also harder to set up and control. I only use them as a last resort.

Prerequisites

Ally
Collective Code Ownership
Reflective Design

Anybody can use keystones. Feature flags run the risk of growing out of control, so your team needs pay attention to their design and removal, especially as they multiply. Collective code ownership and reflective design help.

Despite their superficial similarity to privileges that control user access to features, feature flags are meant to be temporary. Don’t use feature flags as a replacement for proper user access control.

Indicators

When you use keystones and feature flags well:

  • Your team can deploy software that includes incomplete code.

  • Releasing is a business decision, not a technical decision.

  • Flag-related code is clean, well-designed, and well-tested.

  • Flags and their code are removed after the corresponding feature is released.

Alternatives and Experiments

Ally
Refactoring
Reflective Design

Feature branches are a common alternative to keystones and feature flags. When someone starts working on a new feature, they create a branch, and they don’t merge that branch back into the rest of the code until the feature is done. This is effective at keeping unfinished changes out of the hands of customers, but significant refactorings tend to cause merge conflicts. This makes it a poor choice for Delivering teams, which rely on refactoring and reflective design to keep costs low.

Keystones are so simple, they don’t leave a lot of room for experimentation. Feature flags, on the other hand, are ripe for exploration. Look for ways to keep your feature flags organized and the design clean. Consider how your flags can provide new business capabilities. For example, feature flags are often used for A/B testing, which involves showing different versions of your software to different users, then making decisions based on the results.

As you experiment, remember that simpler is better. Although keystones may seem like a cheap trick, they’re very effective, and they keep the code clean. It’s easy for feature flags to get out of control. Stick with simple solutions whenever you can.

Further Reading

Martin Fowler goes into more detail about keystones in [Fowler 2020a].

Pete Hodgson has a very thorough discussion of feature flags in [Hodgson 2017].

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.