James Shore: The Art of Agile Development: Continuous Integration

The Art of Agile Development: Continuous Integration

September 9, 2010

Book cover for “The Art of Agile Development, Second Edition” by James Shore. Published by O'Reilly. The cover artwork shows a water glass containing a small sapling. The sapling has small green leaves. There is a goldfish in the glass.

The second edition is now available! The Art of Agile Development has been completely revised and updated with all new material. Visit the Second Edition page for more information, or buy it on Amazon.

in 99 words

Keep code integrated and build release infrastructure with the rest of the application. The ultimate goal is to be able to deploy all but the last few hours of work at any time.

To do so, integrate every few hours and keep your build, test, and other release infrastructure up to date. Each integration should get as close to a real release as possible.

Prefer synchronous integration, in which you wait for the integration to succeed, to asynchronous integration, in which a tool tests the integration for you. Synchronous integration requires fast builds, but ensures that they never break.

Brain Dump

Forces Affecting Continuous Integration

Full Text

Continuous Integration

Audience: Programmers

We keep our code ready to ship.

Most software development efforts have a hidden delay between when the team says "we're done" and when the software is actually ready to ship. Sometimes that delay can stretch on for months. It's the little things: merging everyone's pieces together, creating an installer, prepopulating the database, building the manual, and so forth. Meanwhile, the team gets stressed out because they forgot how long these things take. They rush, leave out helpful build automation, and introduce more bugs and delays.

The ultimate goal is to be able to deploy at any time.

Continuous integration is a better approach. It keeps everybody's code integrated and builds release infrastructure along with the rest of the application. The ultimate goal of continuous integration is to be able to deploy all but the last few hours work at any time.

Practically speaking, you won't actually release software in the middle of an iteration. Stories will be half-done and features will be incomplete. The point is to be technologically ready to release even if you're not functionally ready to release.

Why It Works

If you've ever experienced a painful multi-day (or multi-week) integration, integrating every few hours probably seems foolish. Why go through that hell so often?

Actually, short cycles make integration less painful. Shorter cycles lead to smaller changes, which means there are fewer chances for your changes to overlap with someone else's.

That's not to say that collisions don't happen. They do. They're just not very frequent because everybody's changes are so small.

Collisions are most likely when you're making wide-ranging changes. When you do, let the rest of the team know beforehand so they can integrate their changes and be ready to deal with yours.

How to Practice Continuous Integration

In order to be ready to deploy all but the last few hours of work, your team needs to do two things:

Integrate your code every few hours.
Keep your build, tests, and other release infrastructure up to date.

Ally: Test-Driven Development

To integrate, update your sandbox with the latest code from the repository, make sure everything builds, then commit your code back to the repository. You can integrate any time you have a successful build. With test-driven development, that should happen every few minutes. I integrate whenever I make a significant change to the code or create something I think the rest of the team will want right away.

Many teams have a rule that you have to integrate before you go home at the end of the day. If you can't integrate, they say, something has gone wrong and you should throw away your code and start fresh the next day. This rule seems harsh, but it's actually a very good rule. With test-driven development, if you can't integrate within a few minutes, you're likely stuck.

Toss out your recent changes and start over when you get badly stuck.

Each integration should get as close to a real release as possible. The goal is to make preparing for a release such an ordinary occurance that, when you actually do ship, it's a nonevent¹. Some teams that use continuous integration automatically burn an installation CD every time they integrate. Others create a disk image or, for network-deployed products, automatically deploy to staging servers.

¹...except for the release party, of course.

Never Break the Build

When was the last time you spent hours chasing down a bug, only to find that it was a problem with your computer's configuration or in somebody else's code? Conversely, when was the last time you spent hours blaming your computer's configuration (or somebody else's code) only to find that the problem was in code you just wrote?

On typical projects, when we integrate, we don't have confidence in the quality of our code or in the quality of the code in the repository. The scope of possible errors is wide; if anything goes wrong, we're not sure where to look.

Reducing the scope of possible errors is the key to developing quickly. If you have total confidence that your software worked five minutes ago, then only the actions you've taken in the last five minutes could cause it to fail now. That reduces the scope of the problem so much that you can often figure it out just by looking at the error message—there's no debugging necessary.

Agree as a team never to break the build.

To achieve this, agree as a team never to break the build. This is easier than it sounds: you can actually guarantee that the build will never break (well, almost never) by following a little script.

The Continuous Integration Script

To guarantee an always-working build, you have to solve two problems. First, you need to make sure that what works on your computer will work on anybody's computer. (How often have you heard the phrase, "It worked on my machine!"?) Second, you need to make sure that nobody gets code that hasn't been proven to build successfully.

To do this, you need a spare development machine to act as a central integration machine. You also need some sort of physical object to act as an integration token. (I use a rubber chicken. Stuffed toys work well, too.)

With an integration machine and integration token, you can ensure a working build in several simple steps.

To Update From the Repository

Check that the integration token is available. If it isn't, another pair is checking in unproven code and you need to wait until they finish.
Get the latest changes from the repository. Others can get changes at the same time, but don't let anybody take the integration token until you finish.

Run a full build to make sure everything compiles and passes tests after you get the code. If it doesn't, something went wrong. The most common problem is a configuration issue on your machine. Try running a build on the integration machine. If it works, debug the problem on your machine. If it doesn't, find the previous integrators and beat them about the head and shoulders, if only figuratively.

To Integrate

Update from the repository (follow the previous script). Resolve any integration conflicts and run the build (including tests) to prove that the update worked.
Get the integration token and check in your code.
Go over to the integration machine, get the changes, and run the build (including tests).
Replace the integration token.

If the build fails on the integration machine, you have to fix the problem before you give up the integration token. The fastest way to do so is to roll back your changes. However, if nobody is waiting for the token, you can just fix the problem on your machine and check in again.

Avoid fixing problems manually on the integration machine. If the build worked on your machine, you probably forgot to add a file or to add a new configuration to the build script. In either case, if you correct the problem manually, the next people to get the code won't be able to build.

Continuous Integration Servers

There's a lively community of open-source continuous integration servers (also called CI servers). The granddaddy of them all is CruiseControl, pioneered by ThoughtWorks employees.

A continuous integration server starts the build automatically after check-in. If the build fails, it notifies the team. Some people try to use a continuous integration server instead of the continuous integration script discussed earlier. This doesn't quite work because without an integration token, team members can accidentally check out code that hasn't yet been proven to work.

Another common mistake is to use a continuous integration server to shame team members into improving their build practices. Although the "wow factor" of a CI server can sometimes inspire people to do so, it only works if people are really willing to make an effort to check in good code. I've heard many reports of people who tried to use a CI server to enforce compliance, only to end up fixing all of the build failures themselves while the rest of the team ignored their strong-arming.

If your team sits together and has a fast build, you don't need the added complexity of a CI server. Simply walk over to the integration machine and start the build when you check in. It only takes a few seconds—less time than it takes for a CI server to notice your check-in—and gives you an excuse to stretch your legs.

If you do install a CI server, don't let it distract you. Focus on mastering the practice of continuous integration, not the tool. Integrate frequently, never break the build, and keep your release infrastructure up to date.

Introducing Continuous Integration

Get the team to agree to continuous integration rather than imposing it on them.

The most important part of adopting continuous integration is getting people to agree to integrate frequently (every few hours) and never to break the build. Agreement is the key to adopting continuous integration because there's no way to force people not to break the build.

If you're starting with XP on a brand-new project, continuous integration is easy to do. In the first iteration, install a version control system. Introduce a ten-minute build with the first story, and grow your release infrastructure along with the rest of your application. If you are disciplined about continuing these good habits, you'll have no trouble using continuous integration throughout your project.

If you're introducing XP to an existing project, your tests and build may not yet be good enough for continuous integration. Start by automating your build (see Ten-Minute Build earlier in this chapter), then add tests. Slowly improve your release infrastructure until you can deploy at any time.

Dealing with Slow Builds

Ally: Ten-Minute Build

The most common problem facing teams practicing continuous integration is slow builds. Whenever possible, keep your build under ten minutes. On new projects, you should be able to keep your build under ten minutes all the time. On a legacy project, you may not achieve that goal right away. You can still practice continuous integration, but it comes at a cost.

When you use the integration script discussed earlier, you're using synchronous integration—you're confirming that the build and tests succeed before moving on to your next task. If the build is too slow, synchronous integration becomes untenable. (For me, 20 or 30 minutes is too slow.) In this case, you can use asynchronous integration instead. Rather than waiting for the build to complete, start your next task immediately after starting the build, without waiting for the build and tests to succeed.

The biggest problem with asynchronous integration is that it tends to result in broken builds. If you check in code that doesn't work, you have to interrupt what you're doing when the build breaks half an hour or an hour later. If anyone else checked out that code in the meantime, their build won't work either. If the pair that broke the build has gone home or to lunch, someone else has to clean up the mess. In practice, the desire to keep working on the task at hand often overrides the need to fix the build.

If you have a very slow build, asynchronous integration may be your only option. If you must use this, a continuous integration server is the best way to do so. It will keep track of what to build and automatically notify you when the build has finished.

Switch to synchronous integration when you can.

Over time, continue to improve your build script and tests. Once the build time gets down to a reasonable number (15 or 20 minutes), switch to synchronous integration. Continue improving the speed of the build and tests until synchronous integration feels like a pleasant break rather than a waste of time.

Multistage Integration Builds

Some teams have sophisticated tests, measuring such qualities as performance, load, or stability, that simply cannot finish in under ten minutes. For these teams, multistage integration is a good idea.

A multistage integration consists of two separate builds. The normal ten-minute build, or commit build, contains all the normal items necessary to prove that the software works: unit tests, integration tests, and a handful of end-to-end tests (see Test-Driven Development in Chapter 9 for more about these types of tests). This build runs synchronously as usual.

In addition to the regular build, a slower secondary build runs asynchronously. This build contains the additional tests that do not run in a normal build: performance tests, load tests, and stability tests.

Prefer improved tests to a multistage integration.

Although a multistage build is a good idea for a mature project with sophisticated testing, most teams that I encounter use multistage integration as a workaround for a slow test suite. I prefer to improve the test suite instead; it's more valuable to get better feedback more often.

If this is the case for you, a multistage integration might help you transition from asynchronous to synchronous integration. However, although a multistage build is better than completely asynchronous integration, don't let it stop you from continuing to improve your tests. Switch to fully synchronous integration when you can: only synchronous integration guarantees a known-good build.

Questions

I know we're supposed to integrate at least every four hours, but what if our current story or task takes longer than that?

You can integrate at any time, even when the task or story you're working on is only partially done. The only requirement is that the code builds and passes its tests.

What should we do while we're waiting for the integration build to complete?

Ally: Ten-Minute Build

Take a break. Get a cup of tea. Perform ergonomic stretches. Talk with your pair about design, refactoring opportunities, or next steps. If your build is under ten minutes, you should have time to clear your head and consider the big picture without feeling like you're wasting time.

Isn't asynchronous integration more efficient than synchronous integration?

Synchronous integration reduces integration problems.

Although asynchronous integration may seem like a more efficient use of time, in practice it tends to disrupt flow and leads to broken builds. If the build fails, you have to interrupt your new task to roll back and fix the old one. This means you must leave your new task half-done, switch context (and sometimes partners) to fix the problem, then switch back. It's wasteful and annoying.

Instead of switching gears in the middle of a task, many teams let the build remain broken for a few hours while they finish the new task. If other people integrate during this time, the existing failures hide any new failures in their integration. Problems compound and cause a vicious cycle of painful integrations leading to longer broken builds, which lead to more integration problems, which lead to more painful integrations. I've seen teams that practice asynchronous integration leave the build broken for days at a time.

Remember, too, that the build should run in under ten minutes. Given a fast build, the supposed inefficiency of synchronous integration is trivial, especially as you can use that time to reflect on your work and talk about the big picture.

Are you saying that asynchronous integration would never work?

You can make asynchronous integration work if you're disciplined about keeping the build running fast, checking in frequently, running the build locally before checking in, and fixing problems as soon as they're discovered. In other words, do all the good things you're supposed to do with continuous integration.

Synchronous integration makes you confront these issues head on, which is why it's so valuable. Asynchronous integration, unfortunately, makes it all too easy to ignore slow and broken builds. You don't have to ignore them, of course, but my experience is that teams using asynchronous integration have slow and broken builds much more often than teams using synchronous integration.

Ron Jeffries said it best:²

²Via the art-of-agile mailing list, http://tech.groups.yahoo.com/group/art-of-agile/message/365.

When I visit clients with asynchronous builds, I see these things happening, I think it's fair to say invariably:

The "overnight" build breaks at least once when I'm there;

The build lamp goes red at least once when I'm there, and stays that way for more than an hour.

With a synchronous build, once in a while you hear one pair say "Oh, shjt."

I'm all for more automation. But I think an asynch build is like shutting your eyes right when you drive through the intersection.

Our version control system doesn't allow us to roll back quickly. What should we do?

Ally: Version Control

The overriding rule of the known-good build is that you must know the build works when you put the integration token back. Usually, that means checking in, running the build on the integration machine, and seeing it pass. Sometimes—we hope not often—it means rolling back your check-in, running the old build, and seeing that pass instead.

If your version control system cannot support this, consider getting one that does. Not being able to revert easily to a known-good point in history is a big danger sign. You need to be able to revert a broken build with as much speed and as little pain as possible so you can get out of the way of other people waiting to integrate. If your version control can't do this for you, create an automated script that will.

One way to script this is to check out the older version to a temporary sandbox. Delete all of the files in the regular sandbox except for the version control system's metadata files, then copy all of the non-metadata files over from the older version. This will allow you to check in the old version on top of the new one.

We rolled back our check-in, but the build is still failing on the integration machine. What do we do now?

Oops—you've almost certainly exposed some sort of configuration bug. It's possible that the bug was in your just-integrated build script, but it's equally possible that there was a latent bug in one of the previous scripts and you accidently exposed it. (Lucky you.)

Either way, the build has to work before you give up the integration token. Now you debug the problem. Enlist the help of the rest of the team if you need to; a broken integration machine is a problem that affects everybody.

Why do we need an integration machine? Can't we just integrate locally and check in?

In theory, if the build works on your local machine, it should work on any machine. In practice, don't count on it. The integration machine is a nice, pristine environment that helps prove the build will work anywhere. For example, I occasionally forget to check in a file; watching the build fail on the integration machine when it passed on mine makes my mistake obvious.

Nothing's perfect, but building on the integration machine does eliminate the majority of cross-machine build problems.

I seem to always run into problems when I integrate. What am I doing wrong?

One cause of integration problems is infrequent integration. The less often you integrate, the more changes you have to merge. Try integrating more often.

Another possibility is that your code tends to overlap with someone else's. Try talking more about what you're working on and coordinating more closely with the pairs that are working on related code.

If you're getting a lot of failures on the integration machine, you probably need to do more local builds before checking in. Run a full build (with tests) before you integrate to make sure your code is okay, then another full build (with tests) afterwards to make the integrated code is okay. If that build succeeds, you shouldn't have any problems on the integration machine.

I'm constantly fixing the build when other people break it. How can I get them to take continuous integration seriously?

It's possible that your teammates haven't all bought into the idea of continuous integration. I often see teams in which only one or two people have any interest in continuous integration. Sometimes they try to force continuous integration on their teammates, usually by installing a continuous integration server without their consent. It's no surprise that the team reacts to this sort of behavior by ignoring broken builds. In fact, it may actually decrease their motivation to keep the build running clean.

Talk to the team about continuous integration before trying to adopt it. Discuss the trade-offs as a group, collaboratively, and make a group decision about whether to apply it.

If your team has agreed to use continuous integration but is constantly breaking the build anyway, perhaps you're using asynchronous integration. Try switching to synchronous integration, and follow the integration script exactly.

Results

When you integrate continuously, releases are a painless event. Your team experiences fewer integration conflicts and confusing integration bugs. The on-site customers see progress in the form of working code as the iteration progesses.

Contraindications

Don't try to force continuous integration on a group that hasn't agreed to it. This practice takes everyone's willful cooperation.

Allies: Version Control; Ten-Minute Build

Using continuous integration without a version control system and a ten-minute build is painful.

Synchronous integration becomes frustrating if the build is longer than ten minutes and too wasteful if the build is very slow. My threshhold is twenty minutes. The best solution is to speed up the build.

A physical integration token only works if all of the developers sit together. You can use a continuous integration server or an electronic integration token instead, but be careful to find a token that's as easy to use and as obvious as a physical token.

Integration tokens don't work at all for very large teams: people spend too much time waiting to integrate. Use private branches in your version control system instead: Check your code into a private branch, build the branch on an integration machine—you can have several—then promote the branch to the mainline if the build succeeds.

Alternatives

If you can't perform synchronous continuous integration, try using a continuous integration server and asynchronous integration. It will lead to more problems than synchronous integration, but is the best of the alternatives.

If you don't have an automated build, you won't be able to practice asynchronous integration. Delaying integration is a very high risk activity. Instead, create an automated build as soon as possible, and start practicing one of the forms of continuous integration.

Some teams perform a daily build and smoke test. Continuous integration is a more advanced version of the same practice; if you have a daily build and smoke test, you can migrate to continuous integration. Start with asynchronous integration and steadily improve your build and tests until you can use synchronous integration.