AoAD2 Chapter: Scaling Agility

Book cover for “The Art of Agile Development, Second Edition.”

Second Edition cover

This is a pre-release excerpt of The Art of Agile Development, Second Edition, to be published by O’Reilly in 2021. Visit the Second Edition home page for information about the open development process, additional excerpts, and more.

Your feedback is appreciated! To share your thoughts, join the AoAD2 open review mailing list.

This excerpt is copyright 2007, 2020, 2021 by James Shore and Shane Warden. Although you are welcome to share this link, do not distribute or republish the content without James Shore’s express written permission.

Revised: July 16, 2021

Scaling Agility

In a perfect world, every Agile team would be perfectly isolated, completely owning their product or portfolio of products. Cross-team coordination is a common source of delays and errors. If every team were isolated, that wouldn’t be a problem.

It’s also not at all realistic. A typical Agile team has 4-10 people. That’s often not enough.

So, then, how do you scale? Although this book is focused on individual Agile teams, the question is important enough to deserve a chapter of its own.

Scaling Fluency

Far too often, organizations invest in scaling Agile without investing in teams’ fluency or organizational capability.

Far too often, organizations try to scale Agile without actually having the ability to be Agile in the first place. They invest a lot of time and money in the large-scale Agile flavor of the day, without investing in teams’ fluency or organizational capability. It never works.

In order to scale Agile, you’ll need to scale your organization’s ability to be Agile. This involves three parts: organizational capability, coaching capability, and team capability.

Organizational Capability

One of the biggest mistakes organizations make in trying to introduce Agile is to fail to make the investments described in chapter “Invest in Agility”. But even if your organization takes those investments seriously, there’s likely to be some hidden trouble spots.

Before you spend a lot of money on scaling Agile, work out the kinks in your organizational capability. If you’re working with an expert advisor, they’ll have specific suggestions. If you’re going it alone, start with a pilot team, or a small group of teams—no more than five.

As your pilot teams develop their fluency, they’ll identify organizational roadblocks and problems that prevent them from achieving fluency. Make note of those problems. They’re likely to recur. You don’t need to solve them organization-wide, but you do need to solve them for each team you’d like to be Agile.

Once you’ve established your organization’s ability to support fluent teams, then you can scale out further. Until then, tempting though it may be to push Agile aggressively, stick with your existing approach for everyone but your pilot teams.

Coaching Capability

You’ll need a coach or coaches who can help with the big picture of scaling Agile: cross-team coordination and organizational capability, product/portfolio management, and change management. Although you can use books and training to develop these coaches internally, it’s best to hire someone who’s done it before.

You’ll also need skilled team-level coaches, and this is likely to be the main limit on your ability to scale. Team-level coaches are the people who help each team become fluent, and they’re vital. Every team will need at least one, as I discuss in “Coaching Skills” on page XX.

You can either hire experienced team-level coaches or develop your own coaches in-house. If you’re taking a home-grown approach, each coach will need resources, such as this book, to help them learn.

You can scale out more quickly by encouraging experienced team-level coaches to move to another team when their current team approaches fluency. (The checklists at the beginnings of parts 2-4 will help you gauge fluency.) By that point, some team members are likely to be qualified to act as team-level coaches themselves, and can start developing their coaching skills on your more-experienced teams. Be sure this sort of lateral movement enhances, rather than harms, your coaches’ careers, or your supply of coaches will dry up.1

1Thanks to Andrew Stellman for pointing out the dangers of lateral movement on Twitter (https://twitter.com/AndrewStellman/status/1316114014322274304).

Coaching skills are different than development skills. Even your best team members could struggle with learning how to be a good coach. You might be able to scale out your team-level coaching capability faster by hiring people to coach the coaches.

Experienced team-level coaches may be able to work with two teams simultaneously, although it’s not always a good idea for teams pursuing Delivering fluency. Less-experienced coaches should be dedicated to a single team.

Team Capability

Your coaches will help your teams gain fluency. The more experienced your coaches, the faster this will go, but it will still take time. “Make Time for Learning” on page XX gives some ballpark figures.

You can brute-force the problem by hiring a big consultancy to staff your teams with experienced Agile developers at a ratio of 50% or more. With the right people and a high enough ratio, this can result in instant fluency, if you’ve already made the effort to establish organizational capability.

Be cautious of this approach. The strategy is sound, if costly, but execution tends to falter. The people who augment your teams play a huge role in the success of this approach, and there’s a very real risk of hiring the wrong firm. Everybody says their developers have Agile skills, but even for big-name firms, there’s a lot more bandwagon-riding than actual capability. With a few notable exceptions, when the added staff have any Agile skills at all, they’re usually limited to the Focusing zone.

The other risk of the staff augmentation approach is coaching skills. Even if the added staff have the skills needed to create instant fluency—which is far from certain—they aren’t likely to have the skills to coach, too. The Agile changes could fail to stick when the consultancy pulls out.

The staff augmentation approach can work if you hire the right firm. If you go that route, be sure to supplement it with a focus on growing your own coaches. Don’t expect your staff-augmentation firm to do it for you; it’s a very different skill-set. Look to smaller “boutique” and independent consultancies that specialize in Agile change and coaching-the-coaches. The people you hire matter more than the vendor, especially for these specialized skills, and small consultancies do a better job of recognizing this.

Scaling Products and Portfolios

Successfully scaling Agile is a matter of figuring out how to manage dependencies.

Fluency is the basis for successfully scaling Agile, but it isn’t enough on its own. Unless every team works completely independently, you also need a way of coordinating their work. This is harder than it sounds, because teams have dependencies on each other that tend to result in bottlenecks, delays, and communication errors. Successfully scaling Agile is a matter of figuring out how to manage those dependencies.

There are two basic strategies for scaling Agile: vertical scaling, which attempts to increase the number of teams who can work together without bottlenecks, and horizontal scaling,, which attempts to remove bottlenecks by isolating teams’ responsibilities. The two strategies can be used together.

Scaling Vertically

Vertical scaling is about increasing the number of teams who can share ownership of a product or portfolio. By “sharing ownership,” I meant that they don’t have a defined area to work on. Every team can work on every part of the product and can touch any code.

I’ll discuss two approaches to doing so: LeSS and FAST. For clarity, I’ll use the terminology in this book rather than the terms they use, but I’ll put their terms in parentheses.

LeSS

LeSS, which stands for “Large-Scale Scrum,” is one of the original Agile scaling approaches. It was created by Craig Larman and Bas Vodde in 2005.2

2Many thanks to Bas Vodde for providing feedback on my discussion of LeSS.

Basic LeSS is suitable for 2-8 teams of up to eight people each. All teams work from the same visual plan (which LeSS calls the “product backlog”) and they share ownership of all their code. There’s also LeSS Huge, which scales to even more teams. I’ll discuss it later.

A group of LeSS teams is guided by a product manager (LeSS calls them a “product owner”) who is responsible for deciding product direction. The teams work in fixed-length iterations which are typically two weeks long. At the beginning of every iteration, the teams come together to look at the visual plan and decide which customer-centric stories (“backlog items” or “features”) each team will work on. The teams only work on the highest-priority stories.

Every so often, the teams come together to play the planning game (“refine the backlog”). This typically happens in the middle of each iteration. Teams are welcome to add stories to the visual plan and suggest priorities to the product manager.

Each LeSS team is a feature team, which means they work on complete stories, from beginning to end, regardless of which code that involves. Once a team takes responsibility for a story, they own it. They’re expected to work with customers and other stakeholders to clarify details, and they’re expected to modify and improve whichever parts of the codebase necessary to finish each story. There’s no concept of team-level code ownership in LeSS.

Because multiple LeSS teams could end up touching the same code, they’re expected to coordinate with each other to prevent problems. The coordination is typically ad-hoc and peer-to-peer. Team members know when they need to coordinate because they worked together to choose stories, and part of the discussion involves considering how and when to coordinate.

Collective code ownership is made possible through the use of continuous integration, which involves every programmer merging their latest code to a shared branch at least every few hours. LeSS also includes a variety of other mechanisms for coordinating, mentoring, and learning.

Adopting LeSS

The material in this book is completely compatible with LeSS, except that most things related to team ownership are owned by the LeSS teams together, rather than a specific team. This is particularly true of product management and code ownership. Additionally, some of LeSS’s terms are different than this book’s, but they can found in the index.

Continuous integration is particularly important for LeSS, and the commit build needs to be fast. You might need to use multi-stage builds (see “Multistage Integration Builds” on page XX) more aggressively than this book recommends. Specifically, you may need to move some or all of your tests to the secondary build, despite the increased risk of breaking the build.

Allies
Collective Code Ownership
Test-Driven Development
Continuous Integration

If you’re looking for an established, well-tested approach to scaling Agile, start with LeSS. You’ll need to develop fluency in the Focusing and Delivering zones. The Focusing zone is fundamental and the Delivering zone is necessary for teams to share ownership of their code. At a minimum, you’ll need collective code ownership, test-driven development, and continuous integration.

For more about LeSS, see the LeSS website at less.works, or the LeSS book, Large-Scale Scrum: More with LeSS. [Larman and Vodde 2015]

FAST
FAST is one of the most promising approaches to scaling I’ve seen.

FAST stands for Fluid Scaling Technology. It’s the brainchild of Ron Quartel, and it’s one of the most promising approaches to scaling I’ve seen. Unfortunately, at the time of this writing, it’s also the least proven. I’m including it because I think it deserves your attention.3

3Many thanks to Ron Quartel for providing feedback on my discussion of FAST.

Ron Quartel created FAST at a health insurance provider in Washington. At its peak, he had 65 people operating as a single team. He started with Extreme Programming (XP) as the base, then layered on Open Space Technology, a technique for helping large groups self-organize around topics.

In comparison to LeSS, FAST is much more, well, fluid. LeSS is based on iterations and long-lived teams that own specific stories. FAST uses a continuous flow of work and forms new teams every few days. There’s no team-level ownership in FAST.

A FAST group is called a “tribe.” Each tribe consists of developers and one or more product managers (which FAST calls “product directors”) who are responsible for setting direction. The whole tribe can consist of up to 150 people, in theory, although that hadn’t been tested at the time of this writing.

Every two days—although this is flexible—the tribe gets together for a “FAST Meeting,” where they decide what to work on. It’s a short, quick meeting. The product managers explain their priorities, and then people volunteer to lead a team to work on something. These leaders are called “team stewards.” Anybody can volunteer to be a steward. It’s a temporary role that only lasts until the next FAST Meeting.

Product managers’ priorities are a guide, not a dictate. Team stewards can choose to work on whatever they like, although they’re expected to act in good faith. That sometimes involves doing something the product managers didn’t explicitly ask for, such as cleaning up crufty code or reducing development friction.

Once the stewards have volunteered, and explained what their team will work on, the rest of the tribe self-selects onto the teams, according to who they want to work with and what they want to work on.

Rather than creating detailed story breakdowns, FAST teams create a “discovery tree” for each valuable increment. (A valuable increment is something that can be released on its own—see “Valuable Increments” on page XX.) A discovery tree is hierarchical, just-in-time breakdown of the work required to release the increment. It’s represented with sticky notes on a wall, or virtual stickies on a virtual whiteboard.

Teams work for two days, or whatever cadence the tribe has chosen. They’re not expected to finish anything specific in that time. Instead, they just make as much progress as they can. The discovery trees are used to provide continuity and help people see progress. Someone may also volunteer to be a “feature steward” for a particular discovery tree, if needed for additional continuity. Other cross-team coordination happens on an ad-hoc, peer-to-peer basis, similar to LeSS.

After the two days are up, the tribe has another FAST meeting. The teams briefly recap their progress and the cycle repeats. It’s fast, fluid, and low ceremony.

Adopting FAST

FAST isn’t as compatible with this book as LeSS is. Many of the practices in the Focusing zone won’t apply perfectly.

Allies
Team Room
Alignment
Retrospectives
Visual Planning
The Planning Game
Task Planning
Capacity
Slack
Stand-Up Meetings
Forecasting
Team Dynamics

Specifically:

  • Everything that refers to “the team” in this book applies to the overall FAST tribe instead;

  • You will have additional team room needs, although the existing guidance remains relevant, especially for remote teams;

  • Alignment chartering and retrospectives have to be adjusted to work with a larger group of people, and they’re likely to need more experienced facilitation, especially for remote teams;

  • Visual planning applies as-is, but no longer includes anything smaller than a valuable increment;

  • The planning game, task planning, and capacity are no longer needed;

  • Slack needs to be introduced in another way;

  • Stand-up meetings are replaced by the FAST meeting;

  • Forecasting is entirely different (and much simpler, although its accuracy hasn’t been assessed); and

  • Team dynamics are complicated by the lack of stable teams.4

4XXX Add reference to Dynamic Reteaming?

On the other hand, the Delivering and Optimizing practices apply equally well. As with LeSS, you may need to be more aggressive about the speed of continuous integration.

Although FAST hasn’t been proven to the degree LeSS has, I think it’s very promising. If you have Agile experience and are comfortable trying it with a pilot team of 10-30 people, I recommend giving it a shot.

To try FAST, you’ll need experienced coaches. In theory, FAST only requires Focusing fluency, but Ron Quartel included experienced XP coaches in his FAST pilot, and I suspect their familiarity with Delivering as well as Focusing practices is part of what made FAST work. If you try it, I suggest you do the same.

You can find more about FAST at fastagile.io. Look for the “FAST Guide.” It’s a quick and easy read. I also have an interview with Ron Quartel about FAST at [Shore 2021].

Challenges and benefits of vertical scaling
Ally
Collective Code Ownership

The achilles heel of vertical scaling is also its strength: shared ownership. A vertically-scaled group of teams shares responsibility for the entire codebase. This requires people to have familiarity with a wide variety of code. In practice, at least for LeSS and FAST, people do tend to specialize, choosing to work on things that are familiar, but it’s still a lot to learn.

That’s not the biggest problem, though. The real problem is that it’s easy for collective code ownership to turn into no code ownership. You see, collective code ownership doesn’t just grant the ability to change code; it also grants a responsibility to make the code better when you see an opportunity. It’s easy for large groups to assume somebody else will do it. This can be a problem in small teams, too, but it’s magnified in large groups. They require extra coaching to help people follow through on their responsibility.

On the other hand, vertical scaling solves one of the major problems when scaling Agile: creating cross-functional teams. Agile teams need people with specialized skills, such UX design, operations, and security. If your teams only have six or seven people each, it’s hard to justify including people with those skills on every team. But then you run into an allocation problem. How do you make sure each team has everyone it needs at the time it needs them?

This isn’t a problem for vertically-scaled groups. If you have thirty people, and only enough work for two UX folks, no problem. You can include just two UX people. In FAST, they’ll allocate themselves to the teams that need their skills. In LeSS, they’ll join a specific team or two, and those teams will volunteer for UX-related work.

Scaling Horizontally

Although vertical scaling is my preferred approach to large-scale Agile, many organizations turn to horizontal scaling instead. In horizontal scaling, the focus is on allowing teams to work in isolation. Rather than sharing ownership of a product or portfolio, as vertical scaling does, horizontal scaling slices up the product or portfolio into individual responsibilities which are owned by specific teams.

The challenge in horizontal scaling is to define team responsibilities in a way that keeps teams isolated as possible. It’s very difficult to get right, and it has trouble adjusting to changes in product priorities.

In theory, each team should own a customer-centric end-to-end slice of the product. In practice, horizontally-scaled teams are so small, they have trouble owning a whole slice. You end up with two teams needing access to the same code. But in the horizontally-scaled model, teams aren’t supposed to share code with other teams.

As a result, although the ideal is for every team to own a slice of the product, you almost always have to introduce other, less ideal types of teams as well. The book Team Topologies [Skelton and Pais 2019] divides them into four categories:

  • Stream-aligned teams. The ideal. Focused on a particular product, customer-facing slice of a product, or customer group.

  • Complicated-subsystem teams. Focused on building a part of the system that requires particularly specialized knowledge, such as a machine-learning component in a larger cloud offering. These types of teams should be created carefully, and only when the knowledge needed is truly specialized.

  • Enabling teams. Focused on providing specialized expertise to other teams, such as UX, operations, or security. Rather than doing work on behalf of other teams, which would cause them to become a bottleneck, they focus on helping teams learn how to do the work themselves. Sometimes this involves providing resources for simplifying complex problems, such as security checklists or UX design guidelines.

  • Platform teams. Similar to enabling teams, except they provide tooling rather than direct help. Like enabling teams, they don’t solve problems for other teams; instead, their tools allow teams to solve their own problems. For example, a platform team may provide tools for deploying software.

The secret to successful horizontal scaling is how you allocate responsibilities to teams. The fewer cross-team dependencies, the better. It’s fundamentally a question of architecture, because the responsibilities of your teams need to mimic your desired system architecture. (This is also called the Inverse Conway Maneuver.)

Horizontal scaling works best when you only have a handful of teams. When the number of teams are small, it’s easy to understand how everyone fits together and to coordinate their work. If there’s a problem, representatives from each team can get together and work things out.

The ad-hoc coordination approach breaks down somewhere between 5-10 teams. Bottlenecks start to form, with some teams stalled and others having too much work. You have to pay particular attention to your team design to keep teams as independent as possible and to minimize cross-team dependencies. Every team, especially non-stream-aligned teams, have to make their dependents’ autonomy their top priority, and product managers have to coordinate carefully to make sure everyone’s work aligns.

When you get up to 30-100 teams, even that approach starts to break down. Changes are more frequent and team responsibilities have to be adjusted to keep up with changes in business priorities. You need multiple layers of coordination and management. It becomes impossible for people to understand the whole system.

In practice, although horizontal scaling can continue indefinitely, it becomes more and more difficult to manage as the number of teams grows. Vertical scaling is more flexible, but it can’t scale as far. Fortunately, you can combine the two approaches to get the best of both worlds.

Scaling Vertically and Horizontally

I worked with a start-up that had reached 300 team members and stalled out. (The overall organization had over 1,000 people, but about 300 were on product development teams.) Their teams were all working on different aspects of the same product and their cross-team dependencies were killing them.

I approached it from a horizontal scaling perspective. I helped them restructure their team responsibilities to minimize dependencies and maximize isolation. They ended up with about 40 teams—about the same as before—but they were much more independent. That unblocked their development efforts, and they resumed growing. They got up to 80 teams before they hit new roadblocks.

Everybody was very happy with the results. If I could do it over again, though, I would have introduced vertical scaling too. Instead of 40 teams, they could have formed six 50-person groups. Coordinating six vertically-scaled groups is dramatically easier than 40 small teams, and they wouldn’t have had any problem scaling further. Even once they started running into coordination challenges, the horizontal scaling techniques would allow them to grow by an order of magnitude.

Better yet, because vertically-scaled groups are so large, they all could have been stream-aligned. The design we created had a bunch of enabling teams and platform teams, some of whom struggled to understand their role. Stream-aligned teams are much more straightforward. With vertically-scaled groups, that’s all they would have needed, except for their operations platform.

Part of the reason things broke down when they reached 80 teams is that they hadn’t kept their team responsibilities up to date. We had designed in a mechanism for reviewing and updating team responsibilities—it was the job of the architecture team—but, as so often happens, it got forgotten in the rush of meeting other responsibilities. Vertically-scaled groups don’t need the same amount of maintenance. They have the ability to adapt to changing business conditions much more easily.

In other words, you can combine horizontal scaling with vertical scaling by thinking of your vertically-scaled groups as a single “team” from a horizontal scaling perspective. If you do, almost every one can be stream-aligned, with the possible exception of a group for your operations platform.

My Recommendation

Bottom line: How should you scale your Agile organization?

Begin by emphasizing team fluency.

Begin by emphasizing team fluency. The most common mistake organizations make is to spread Agile widely without building their fundamental capability. In most cases, to scale well, you’ll need your teams to develop both Focusing and Delivering fluency.

Scale vertically before you scale horizontally. In most cases, LeSS is your best choice. If you’re experienced and willing to experiment, try FAST.

If you reach the limits of vertical scaling—probably somewhere around 60-70 people, although FAST may be able to scale further—split into multiple vertically-scaled groups. Each one should be stream-aligned. You shouldn’t need a complicated-subsystem group or enabling group, because your groups will be large enough to include all the expertise you need. In some cases, you might want to extract out a platform group to take care of common infrastructure—typically, an operations and deployment platform.

If you’re using LeSS, LeSS Huge describes this sort of horizontal scaling split, albeit with a slightly different flavor. It retains LeSS’s emphasis on collective code ownership, even across the two groups (which LeSS calls “areas”). However, in practice, the groups tend to specialize.

But remember: successful scaling depends on fluent teams. That’s what the rest of this book is about. We’ll start with Focusing fluency.

Share your feedback about this excerpt on the AoAD2 mailing list! Sign up here.

For more excerpts from the book, or to get a copy of the Early Release, see the Second Edition home page.

If you liked this entry, check out my best writing and presentations, and consider subscribing to updates by email or RSS.