AoAD2 Practice: Forecasting

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Forecasting

Audience
Product Managers

We can predict when we’ll release.

At first glance, Agile offers the perfect solution to forecasting: if you’re using iterations, just add up your stories (or estimates), divide by your capacity, and voila! The number of iterations left before you’re done.

It sounds like a good idea, but it isn’t reliable. For example, let’s say you’re working on a team that consistently finishes six stories every week. You have 30 stories to finish before release, so it will take five weeks, right? It’s January 1st, so you tell your business stakeholders you’ll release on February 5th. They’re enthusiastic about the new release and start telling customers. “Wait until you see what’s coming next! Expect it February 5th!”

That week, you finish six stories, as usual. Along the way, you discover a bug. It’s not a big deal, but it needs to be fixed prior to release. You add a story to fix it in the next iteration. On January 8th, you have 25 stories remaining. You tell your stakeholders that you might be a bit later than February 5th. They urge you to speed up a bit. “Just squeeze it in,” they say. “Our customers are counting on a February 5th release!”

On January 15th, during your stakeholder demo, your stakeholders realize that one of the features needs more audit controls. You add four new stories to address the need. Combined with the six stories you finished, there are 23 stories remaining, which means you definitely won’t be done on February 5th. You propose cutting a feature to bring the date back in, but stakeholders balk. “We’ve already told customers what to expect,” they say. “We’ll just have to tell them there’s been a week’s delay.”

The next week, everything goes smoothly. You finish six stories again, and on January 22nd, there are 17 stories remaining. You’re on track for releasing on February 12th.

The next few weeks don’t go as well. You’ve been waiting for another team to deliver a special UI component. They promised to have it to you in early January, but their date kept slipping. Now you’ve run out of other stories to work on. You pull in some extra “nice to have” stories to keep busy. You finish six stories, as usual, but most of them are new. On January 29th, you still have 15 stories remaining.

Then the team working on the UI component comes clean: they’ve run into an unexpected technical issue. The UI component you’ve been counting on isn’t going to be ready for another month, at minimum. You revise your plan, adding stories to work around the missing component. On February 5th, despite finishing six stories, you still have 13 stories to go. Your stakeholders are getting frustrated. “We’ll push the release out one more week, to February 19th,” they say. “You’ll just have to squeeze in that last story. And we can’t keep slipping the date! We’re getting eaten alive on Twitter.”

The next two weeks are Not Fun. You keep discovering new stories to make up for the missing UI component. Everybody works overtime to try to hit the release date, and you cut back on testing and slack. You ignore your capacity and sign up for more stories on the assumption that the extra time will work out.

It doesn’t work out. At first, everything seems fine. On February 12th, you finished nine stories! But the next week, you learned that four of them had to be reworked because of bugs and missed assumptions. Combined with all the new UI stories, there’s just too much to do. When the 19th rolls around, you still have four stories left.

Finally, the following week, you release. It’s February 26th. You never finished fewer than six stories per week. But somehow, it took eight weeks to release the 30 stories in your original plan.

No matter how careful you are about meeting your iteration plans, something always goes wrong. Sometimes, you need to make reliable predictions anyway. Forecasting is how you do so.

Predefined Release Dates

The best way to forecast is to define when you’ll release, but not what you’ll release.

The best way to forecast is to define when you’ll release, but not what you’ll release. In the example, if stakeholders hadn’t told customers exactly what to expect, they could have cut scope when problems arose and still released on time.

Sharing predictions of both what and when reduces your agility. Agility means seeking out new information and changing your plans in response. Every time you change, you have to update your forecasts. At best, this means the time and effort that went into the forecasts is wasted. More often, people treat your forecasts as commitments, and get upset when you change them.

Instead, only define your release date in advance. Steer your plans so that you’re ready to release your most valuable increments on that date. A common variant of this idea is the release train, which is a predefined series of release dates. (See “Release Early Release Often” on p.XX.)

Ally
Adaptive Planning

The secret to steering your plans is to slice your work into the smallest valuable increments you can. Focus on getting to a releasable state as quickly possible. To do so, set aside every story that isn’t strictly necessary to release.

That bare minimum is your first increment. Once you’ve identified it, take a look at the stories you set aside and decide which can be done on their own, as additional increments. Some of those increments might just be a single “just right”-sized story—in fact, that’s the ideal.

In a perfect world, you want every story to be something that can be released on its own, without having to wait for any additional stories. This gives you the maximum flexibility and ability to steer your plans. “Keep Your Options Open” on p.XX has more details.

At a minimum, your increments need to be small enough that you can easily finish at least one before your predefined release date. As you finish work, keep an eye on how much time is left and use it to decide what to do next. If there’s a lot of time, you can build a big new increment that takes the software in a new direction. If there isn’t much time left, focus on smaller increments that add polish and delight.

Depending on the size of your increments, you can often make the decisions about what to prioritize based on gut feel. If you need more rigor, use a temporary date and scope forecast (described below) to see what will fit. For best results, don’t share those forecasts. This will give your team the flexibility to change their plans later.

Feasibility Forecasts

Rigorous forecasts require detailed planning, but sometimes you just want to know if an idea is worth pursuing.

Any approach that doesn’t involve detailed planning will just be based on gut feel, but that’s okay. People with a lot of experience can make good gut decisions.

To make a feasibility forecast, gather the team’s sponsor, a seasoned product manager or project manager, and a senior programmer or two—preferably ones that will be on the team. Choose people with a lot of experience at your company.

Ask the sponsor to describe the development goals, when work would start, who would be on the team, and the latest release date that would still be worth the cost. Then ask the product manager and programmers if they think it’s possible.

Note that you aren’t asking how long it will take. That’s a harder question to answer. What you’re looking for here is the gut reaction. Phrasing the question in terms of a solid expectation makes the gut reaction more reliable.

If the answer is an unqualified “yes,” then it makes sense to invest in a month or two of development so you can make a real plan and forecast. If the experts waffle a bit, or say “no,” then there’s some risk. Whether or not that risk is worth investing in a better forecast is up to the sponsor.

Date and Scope Forecasts

Sometimes you’ll need to forecast both date and scope, as the team in the example was trying to do. Despite their delays, they were on the right track. They just forgot to account for schedule risk. Schedule risk is the possibility that something will go wrong and delay the work... and something always goes wrong. To make accurate forecasts, you have to include a risk adjustment to account for those problems. Here’s the formula:

number of weeks remaining = number of stories (or estimate total) remaining ÷ capacity or throughput per week × risk adjustment

You can also forecast how many of the stories in your current plan will be done by a predefined release date:

number of stories (or estimate total) = number of weeks remaining × capacity or throughput per week ÷ risk adjustment

Here’s what each of those terms means:

  • Number of weeks remaining: The amount of time between now and your release date.

  • Number of stories (or estimate total) remaining: The number of “just right” stories that need to be completed before release. If you use estimates, it’s the sum of those stories’ estimates.

  • Throughput per week: If you use iterations, it’s the number of stories finished last iteration (or the sum of their estimates, if you use estimates) divided by the number of weeks per iteration. If you use continuous flow, it’s the number of stories finished last week (or the sum of their estimates). Don’t average multiple iterations or weeks; that’s taken care of by the risk adjustment.

  • Risk adjustment: See table “Risk Adjustment Rules of Thumb”.1 Use the “high-risk team” column unless your team has both Focusing and Delivering fluency. Choose the row corresponding to your desired likelihood of meeting or beating the predicted date. For example, forecasts made using the “90%” row will meet or beat the predicted date nine out of ten times.

1These risk numbers are an educated guess. The “high risk” numbers are based on [Little 2003]. The “low risk” numbers are based on DeMarco and Lister’s RISKOLOGY simulator, version 4a, available at https://systemsguild.eu/riskology. I used the standard settings but turned off productivity variance, as capacity automatically adjusts for that risk.

Ally
The Planning Game

Date and scope forecasts depend on stories that are sized “just right” using the planning game. If you haven’t broken all the stories for a release down to that level of detail, you won’t be able to forecast the release. You’ll need to use the planning game to size all your stories first.

Similarly, if the release includes any spike stories, you’ll have to finish all of them before you can make a forecast. This is why spike stories are separate stories in your plan; sometimes it’s valuable to schedule them early so you can resolve risks and make forecasts.

Table 1. Risk adjustment rules of thumb

LikelihoodLow-risk teamHigh-risk team
10% (almost impossible)11
50% (coin toss)1.42
90% (very likely)1.84

Update your forecast at the end of every iteration, or once per week if you use continuous flow. As your release date approaches, the forecast will “narrow in” on the actual release date. Graphing the forecasted release dates over time will help you see trends, especially if your throughput isn’t stable.

A Date and Scope Example

Let’s revisit the example from the introduction. To calculate the number of weeks remaining, we start with the number of stories remaining before we release. In this case, it’s thirty.

Next, we determine the team’s throughput. They finish six stories every week, so their throughput is six.

Finally, we determine the risk adjustment. I usually forecast a range of dates that’s 50-90% likely. It gives a relatively narrow range that I’ll beat about half the time. If I don’t think my audience can handle a range-based forecast, I’ll use the 90% number alone.

The specific risk adjustment depends on whether the team is high risk or low risk, which depends on their fluency. In this case, let’s say that the team is fluent in both the Focusing and Delivering zones, so they’re low risk. That gives us risk adjustments of 1.4 and 1.8 for 50% and 90% likelihoods, yielding the following forecasts:

  • 50% Likely: 30 stories ÷ 6 stories per week × 1.4 risk adjustment = 7.0 weeks

  • 90% Likely: 30 stories ÷ 6 stories per week × 1.8 risk adjustment = 9.0 weeks

If we make this forecast on January 1st, as we did in the opening example, we’ll tell our stakeholders that we’ll be ready to release “between February 19th and March 5th.”

Let’s continue the example:

  • Jan 1: 30 stories remain; forecast: Feb 19 - Mar 5. (7.0 - 9.0 weeks.)

  • Jan 8: 25 stories remain; forecast: Feb 19 - Mar 5. (5.8 - 7.5 weeks. I always round up.)

  • Jan 15: 23 stories remain; forecast: Feb 26 - Mar 5. (5.4 - 6.9 weeks.)

  • Jan 22: 17 stories remain; forecast: Feb 19 - Mar 5. (4.0 - 5.1 weeks.)

  • Jan 29: 15 stories remain; forecast: Feb 26 - Mar 5. (3.5 - 4.5 weeks.)

  • Feb 5: 13 stories remain; forecast: Feb 26 - Mar 5. (3.0 - 3.9 weeks.)

  • Feb 12: 9 stories remain; forecast: Mar 5. (2.1 - 2.7 weeks.)

  • Feb 19: 4 stories remain; forecast: Feb 26 - Mar 5. (1.0 - 1.4 weeks.)

  • Actual release: February 26th.

Figure “Example Iterated Forecast” shows what the forecast looks like in graph form.

A two-axis line chart. The x-axis is labelled “Date forecast made” and shows dates, in one week intervals, from January 1st to February 26th. The y-axis is labelled “Forecasted release date” and shows dates, in one week intervals, from January 29th to March 12th. The body of the graph shows three lines, labelled “10%,” “50%,” and “90%.” On January 1st, they show a forecast ranging from February 5th, to February 19th, to March 5th. Moving from left to right, they gradually converge on a release date of February 26th.

Figure 1. Example iterated forecast

How Date and Scope Forecasts Work

Accurate estimates aren’t a requirement for reliable forecasts.

Software estimates are notoriously inaccurate. So much so, that companies waste immense amounts of time trying to improve estimate accuracy. But they don’t have to. Accurate estimates aren’t a requirement for reliable forecasts.

Instead of chasing perfect estimates (a fool’s errand), you can correct estimate error by using real-world data and a dollop of statistics. That’s how the date and scope forecast in this book works. It starts with an estimate, corrects it for accuracy, and then uses statistics to account for uncertainty.

Correcting for accuracy

When you make an estimate—let’s say you’re estimating how many jellybeans are in a jar—your estimate isn’t going to be right on target. It’s going to be high or low. You can describe its accuracy as a single number: the “actual/estimate” ratio. If your actual/estimate ratio is 2, then there were twice as many jellybeans as you estimated, and if your actual/estimate ratio is 0.5, then there were half as many jellybeans as you estimated.

If you make a lot of estimates, you can look at all your estimates together to see how accurate they are, then correct the estimates for accuracy.

Let’s say you estimate 1,000 jellybean jars. After swearing never to look at another jellybean again, you calculate your actual/estimate ratio and discover that it was always exactly 2. There were always twice as many jellybeans as you estimated.

Now you can estimate jellybean jars with perfect accuracy. Make your estimate, multiply by 2, and done.

Ally
Capacity

The forecast formula is weeks = stories ÷ throughput × risk. The first part, stories ÷ throughput, is correcting for estimate accuracy. It doesn’t matter how accurate your story estimates are, or even if they’re estimated at all. By dividing by throughput, we correct the estimate with data about what you actually finish. It’s the same trick capacity uses.

Accounting for uncertainty

You’ve probably noticed a problem with this approach: Your actual/estimate ratio isn’t going to be perfectly consistent. Your throughput can vary and there can be more stories remaining than you thought. One week, your actual/estimate ratio could be 1.3; another week it could be 3.7, and there’s no way to know which it was until after you release.

This is the uncertainty in your forecast. Your estimates aren’t accurate, which is fine, because we can correct for accuracy. But the inaccuracy isn’t consistent, either, so you can’t say which actual/estimate ratio to use to correct the error.

If you graph the actual/estimate ratio of all your estimates, you’ll end up with a log-normal distribution—a bell curve that’s stretched to the right. Graphing it cumulatively, so each data point shows the percentage of estimates at or below a particular ratio, gives you a graph that looks like figure “Little Estimate Data”. [Little 2003]

A two-axis scatter chart. The x-axis is labelled “Actual/Estimate,” with a log scale from 0.1 to 10.0. The halfway point is marked 1.0 and has a dashed vertical line. The y-axis is labelled “Cumulative distribution,” with a linear scale from 0% to 100%. Two types of data are plotted: “Landmark” data and “DeMarco” data. Both show a clear cumulative log-normal distribution, although there are many more data points for the Landmark data, and it forms a more even curve.

Figure 2. Little estimate data

Little’s data isn’t a fluke. Figure “Star Citizen Estimate Data” shows the actual/estimate ratios of estimates made during the development of the 3.0 release of Star Citizen, a crowdfunded video game that shares a lot of behind-the-scenes information. It’s a huge project with hundreds of developers, but they had a very similar result.2

2The “stair steps” in the Star Citizen graph are a sign that the developers padded their estimates. When that happens, work commonly grows to meet the estimate, which results in parts of the curve being shoved to the right to form vertical lines.

A scatter chart similar to the “Little estimate data” figure. It also shows a cumulative log-normal distribution, although it’s not a perfectly smooth curve. In particular, the data from 1-15% is all perfectly aligned with the 1.0 actual/estimate ratio.

Figure 3. Star Citizen estimate data

So how do you make a forecast? If the actual/estimate ratio can be anywhere from 0.55 to 11.5, as it is for the Star Citizen data, how do you know which to choose?

This is where the risk adjustment comes in.

The risk adjustment

The actual/estimate ratio you choose depends on how much risk you’re willing to accept. If you wanted to be perfectly accurate, with virtually no chance of missing your forecast, you would choose the worst actual/estimate ratio you had ever seen. For Star Citizen, that would be 11.5 (excluding an outlier). If you wanted to be perfectly optimistic, with virtually no chance of meeting your forecast, you would choose the best actual/estimate ratio you had ever seen—0.55, in the case of Star Citizen.

That yields a ridiculously large range of possibilities, though. For Star Citizen, two weeks of stories could release in a week... or in more than five months.

So, more realistically, you’ll choose an actual/estimate ratio that has some risk, but not too much. How much risk you want to accept is up to you. For example, if you wanted to make a Star Citizen forecast that was 90% likely to be achieved, you’d look in the historical Star Citizen data. It shows that 90% of actual/estimate ratios were 5.14 or less, so your 90% likely risk adjustment is 5.14.

Most teams don’t have historical actual/estimate ratio data. That’s where the rules of thumb in table “Risk Adjustment Rules of Thumb” come in. They’re based on publicly-available data. They won’t apply perfectly to your situation, but they’re close enough for most teams.

Custom Risk Adjustments

If you want to improve the accuracy of your forecasts, you can create your own risk adjustment table by tracking your historical release estimates. Starting with your next release—or past releases, if you have the data—keep a copy of every baseline release estimate you make. The baseline release estimate is weeks remaining = stories ÷ throughput. Track the date you made each estimate and the number of weeks you estimated were remaining.

Then, when the release actually happens, go back and calculate how long, in weeks, the release actually took from the date of each estimate. If you were pressured to release early, or had a lot of bugs or hotfixes, choose the date that represents your real release—the final release where the software was actually done—so your forecasts will represent the time your team really needs to create a viable release.

Calculate time to release, not time spent working.

Note that you’re determining the actual time to release, not the actual time spent working. This is an important distinction. Not only does it save you from painstakingly tracking time spent on each story, it accounts for situations where stories are stalled for some reason. This gives you the ability to forecast release dates, not story completion dates. Release dates often lag behind story completion dates on high-risk teams.

You should now have several pairs of numbers: an estimate, in weeks, and the actual time required, also in weeks. Divide the actual by the estimate to get an actual/estimate ratio for each pair.

Finally, sort the ratios from smallest to largest. Calculate the position of each row as a percentage of the total number of rows. (I have a spreadsheet at https://www.jamesshore.com/s/forecasts that will do this for you.) This is your cumulative distribution. Use the actual/estimate column for your risk adjustments. Table “Example Estimate Distribution” shows an example with ten ratios.

Table 2. Example estimate distribution

LikelihoodActual/Estimate
10%1.000
20%1.222
30%1.484
40%1.833
50%2.190
60%2.652
70%3.000
80%3.500
90%5.143
100%11.500

Continue adding to your data set every time you release. For best accuracy, every team should track their data independently, but you can combine data from several similar teams to get started. More data results in better forecasts.

Improving Forecast Ranges

The hard part about forecasts isn’t accuracy; it’s precision. Accuracy without precision is no problem. Here you go: Your next release will come out sometime in the next century... or be cancelled.

100% accurate. 100% useless. To be useful, you need more precision, and to have more precision, you need less uncertainty.

Uncertainty doesn’t come from estimate accuracy (the actual/estimate ratio). You can adjust for any actual/estimate ratio if it’s consistent. Uncertainty comes from the variability in your actual/estimate ratios. These are some common reasons for this variability:

  • Lack of internal quality

  • Changes in team members’ availability

  • Changes in team interruptions and overhead

  • Changes in reliability of suppliers and other dependencies

  • Changes in scope (the work to be done)

Allies
Whole Team
Team Room
Capacity
Slack

Focusing zone practices such as whole team, team room, capacity, and slack help reduce this variability. In general, though, internal quality is one of the biggest factors, and it depends on Delivering fluency. That’s why teams need both Focusing and Delivering fluency to use the “low risk” column of table “Risk Adjustment Rules of Thumb”.

There are two ways to make your forecast narrower. The easy way is to make your increments smaller. A high-risk forecast with a two weeks of stories and a 50-90% risk range results in a forecast of 4-8 weeks. That’s not too bad. On the other hand, six months of stories yields a forecast of 1-2 years. That’s hard to accept.

The harder, but more effective, way to narrow your forecasts is to reduce your risks. Technically, you don’t have to have Focusing and Delivering fluency to use the “low risk” column of table “Risk Adjustment Rules of Thumb”. You just need to be able to answer each of the following questions in the affirmative:

  • Did you have the same capacity in each of the last four iterations? (Or, if you’re using continuous flow, did you finish the same number of stories in each of the last four weeks?)

  • If you use iterations, were all of your stories in the last four iterations “done done?”

  • Did you add no new stories to fix bugs in the last four iterations (or weeks)?

  • For your most recent release, when your stories were done, were you able to release to production immediately, without additional work, waiting for QA, or other delays?

Allies
Slack
No Bugs
Continuous Deployment

Using slack to stabilize your capacity, as described in “Stabilizing Capacity” on p.XX, will help you address the first two questions. Reducing bugs and preventing release delays will help you address the second two. That requires Delivering zone practices, but you don’t necessarily need all of them, and you don’t need full fluency to see their benefits.

Creating a custom risk adjustment table, as described in the previous section, might also help narrow your forecast ranges, especially if your team has below-average variability. Be sure to create it from actual measured data, not guesses.

Questions

Our forecast shows us releasing way too late. What should we do?

You have to cut scope. See “When Your Roadmap Isn’t Good Enough” on p.XX for details.

Your rule-of-thumb risk adjustments are too large. Can we use a lower ratio?

When your forecast gives you bad news, it’s tempting to play with the numbers until you feel happier. Speaking as somebody who’s been there and has the spreadsheets to prove it: this is a waste of time. It won’t change when your software actually releases.

You can use whatever risk adjustment numbers you like, but unless you’re basing your numbers on actual historical data, you’re probably just fooling yourself.

Prerequisites

Fixed release dates and feasibility forecasts are appropriate for any team.

Ally
The Planning Game

To make date and scope forecasts, you need to have a team that’s working on the actual software being forecasted. You should have at least four weeks of development history, and you can only forecast increments with stories that have been sized “just right” with the planning game.

More importantly, though, make sure you really need to forecast. Too many companies ask for forecasts out of habit. Forecasting takes time away from development. Not just the time required to make the forecast itself, but the time required to manage the many emotional responses that surround forecasts, both from team members and stakeholders. It also adds resistance to adapting your plans.

Be clear about who forecasts benefit, why, and how much.

As with everything the team does, you should be clear about who date and scope forecasts benefit, why, and how much. Then compare that value against the other ways your team could spend their time. Fixed release dates are often a better choice.

Indicators

When your team forecasts well:

  • You can coordinate with external events, such as marketing campaigns, that have long lead times.

  • You’re able to coordinate with business stakeholders about upcoming delivery dates.

  • You understand when your team’s costs will exceed its value.

  • You have data to counter unrealistic expectations and deadlines.

Alternatives and Experiments

There are many approaches to date and scope forecasting. The one I’ve described has the benefit of being both accurate and easy. However, its dependency on real development stories that are sized “just right” makes it labor-intensive for pre-development forecasts. It also depends on a lot of historical data for best results, though the rules of thumb are often good enough.

An alternative is to use Monte Carlo simulations to amplify small amounts of data. Troy Magennis has a popular set of spreadsheets to do so at https://www.focusedobjective.com/w/support/. (Look for the “Throughput Forecaster.”)

The downside of Magennis’ spreadsheet, and similar estimating tools, is that it asks you to estimate sources of uncertainty rather than using historical data. For example, Magennis’ spreadsheet asks the user to guess the range of stories remaining, as well as a range of how many stories will be added (or “split,” to use its terminology). These guesses have a profound impact on the forecast, but they’re just guesses.

In contrast, the approach I’ve described doesn’t require guesses. It starts with your actual plan and adjusts it using historical data. Anything that has gone wrong in the past is accounted for. Although your upcoming releases won’t run into exactly the same problems as past releases, they will be similar enough that you can create reliable forecasts.

Before you experiment with other forecasting approaches, make sure you understand the fundamentals described in “How Forecasting Works” on p.XX. A good forecast has two characteristics: first, it accounts for uncertainty by speaking in terms of ranges of probabilities, not absolutes; and second, it incorporates as much empirical data as possible—measurements of reality—not just estimates. Otherwise, it’s a house of cards.

Before you go too far down the rabbit hole, though, remember that the best way to forecast is to pick a predefined release date and steer your plans to meet that date exactly.

Further Reading

XXX Further reading to consider:

  • Agile Estimating and Planning (Cohn)

  • Software Estimation: Demystifying the Black Art (McConnell)

  • Software Estimation Without Guessing (Dinwiddie)

  • When Will It Be Done? (Vacanti)

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.