Why is Everything So Slow? Measuring and Optimising How Engineering Teams Deliver
Key Takeaways
- As teams grow, they start to slow down. Using KPIs such as those explained in the book Accelerate or the SPACE model can help managers to measure, explain and optimise the pace of delivery in their teams.
- Different companies, at different stages of growth, have different expectations of delivery pace. Taking time to understand and adapt to those expectations will enable teams to be successful.
- Consider designing an organisation to minimise cognitive overload and enable focus by establishing clear product and technology domain ownership using models such as Team Topologies.
- Strongly aligned teams are able to deliver more effectively. Objectives and key results, together with an aligned, iterative planning cycle can enable that alignment.
- Tools such as health checks and team charters can help to build a culture of inspection and adaptation. Focusing on both delivery and team happiness enables teams to have a long term, sustainable, delivery pace.
As teams grow, they will slow down. A larger and more mature team, working in a larger codebase, has more to care about than one in a small start-up, where moving fast in order to find market fit is key. Greater focus on building and maintaining complicated systems that will have a longer lifespan is needed. Avoiding excessive technical debt and ensuring systems are secure and performant becomes increasingly important. All this takes time and effort and, to those outside of the technology team, this can appear to just be a slow down of delivery pace.
While slowing down can seem inevitable, it should not mean that teams stop delivering value that can power future business growth. As an engineering leader there will come a time in your career where your CEO or business leader will ask “Why is everything so slow?” What can you do to answer this question? How can you get ahead in order to be prepared, and how can you be confident that your team is moving at the fastest and most sustainable pace?
What Causes Teams to Slow Down as They Grow?
Often as teams scale rapidly, for example in a rapid scale-up scenario as an investment in a company and technology team grows, the delivery pace can often start to slow. I believe that there are many reasons why teams slow down in rapid scale-up environments.
Related Sponsored Content
As teams scale, communication becomes more difficult as more and more people are added to the teams. A team of three people will have three primary communication paths, whereas a team of seventeen has one hundred and thirty-six possible paths, for example! So it gets harder and harder to understand what’s going on and it becomes harder to disseminate information without causing cognitive overload.
Misalignment also becomes a big problem as teams scale, and not having teams and team members aligned on what’s the priority and what’s next can cause wasted work and a perception that the team is slow because the delivery of value to production doesn’t happen. For example, within a microservices architecture, team A may be working on a change to a service, on which team B needs to build further parts of a wider feature. Unless both teams are aware of the need to coordinate that work, there is a high likelihood of misalignment and work being either wasted or needing rework when, upon feature testing, bugs are found in team A’s solution. In the event that team A had delivered their piece of work much earlier than team B, then it’s likely that it would take considerable time and effort to pick that work back up, especially since the team would have moved on to something new. I’ve also seen this be a problem in more traditional, functional, team setups where misalignment between teams working on a backend platform and a frontend team working on a web application can often cause a perception that the frontend team is slow, whereas they are merely waiting for the backend team’s dependant work.
Cognitive overload is also a problem when we think about the scope of what the team needs to care about. Is it the whole system, which typically becomes much larger more quickly in a scale up, or can we limit scope to just a part of that system? Rapidly scaling companies often trade off speed for technical debt, and as systems become harder to work in and developer experience drops, the need to focus on a smaller part of the system becomes even more important. Large, monolithic, codebases without clear internal domain ownership mean that engineers need to have a broad level of understanding of large parts of a system. Balancing this with needing a deeper and narrower understanding of a specific part becomes impossible as a codebase scales and is one reason why breaking monoliths into suites of services or microservices becomes attractive.
Additionally, in start-ups, the responsibility for other important aspects of technology, such as operating the platform on which those applications run, typically falls on a single team. As a team rapidly scales, the overhead of supporting increasingly complicated platforms while also being on the critical path for building new features often causes delivery to slow. Introducing specialised roles such as platform operations, for example, can take a load off the engineers and enable them to focus on software engineering. This really helps speed up an organisation as a whole.
And finally, it’s sometimes just a perception or prioritisation problem. Leaders of rapidly scaling companies are often used to knowing exactly what’s happening, since that was easy to do when those companies were small. Getting stuff done was simple and as the company has scaled then it’s inevitable that it’ll not feel as fast. So learn how to communicate the reasons why technology teams do slow as they grow, and ensure that both directly commercially driven work and strategic technology work is prioritised, so the pace of delivery remains predictable.
What Does Slow Mean Anyway? How to Measure the Delivery Pace
In order to understand whether delivery speed or pace is slowing down or “slow, ” we must first be able to measure it.
Pace of delivery means different things to different people in different organisations. What’s important to understand is what’s expected and what you should aim for; find out what the organisational norm is.
I’ve worked for huge companies like Nokia where speed was not a priority (but it sure should have been) to start-ups and rapid scale-ups like Bloom & Wild (where finding market fit and then growing rapidly means speed of execution is key). What is expected? What does a good pace look like? Ask your leaders what outcomes they want to see and then you can do the easier bit of measuring your progress towards those outcomes.
In terms of how you actually measure your pace, I have found that there are two good starting points. The first is to focus on the metrics described in the book Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim. In this excellent book, the DevOps Research and Assessment (DORA) team, a research program that was acquired by Google in 2018, describes how to use four metrics in order to continuously improve practices, deliver software faster, and ensure that it remains reliable. These four metrics are:
- Change Lead Time – the total time between when work on a change request is initiated to when that change has been deployed to production
- Deployment Frequency – how often a team is pushing code to production
- Change Failure Rate – what proportion of those pushes cause production issues such as incidents, rollbacks or general failures
- Mean Time to Recovery – the time from when an incident is triggered to when it is resolved
I would recommend this as a starting point for anyone looking to measure their team’s delivery. You can find out more in this review of Accelerate on InfoQ.
More recently the ideas in Accelerate have been extended to include additional metrics that contribute to high performing and happy teams, in particular focusing more on the developer experience. The new SPACE framework builds on the DORA work (and shares Nicole Forsgren as a co-author) and recommends measuring metrics in a broader range of areas.
- Satisfaction and well-being (health checks, developer efficacy, etc)
- Performance (outcomes – i.e. reliability, customer satisfaction)
- Activity (deployment frequency for example)
- Communication and collaboration (Network metrics that show who is connected to whom and how, onboarding time, etc)
- Efficiency and flow (for example, the number of handoffs in a process)
I’d recommend starting small here and focusing on how you’ll engage your team on why you’re collecting these metrics and how they can help the team to improve, before spending some time putting in place the mechanics that’ll allow you to take the actual measurements themselves.
Preventing the Delivery Pace from Slowing Too Much
One important point to be aware of is that as teams grow, they will appear to slow down. A larger and more mature team, working in a larger codebase, has more to care about than one in a small start-up, trying to find market fit. They need to place a greater focus on building and maintaining complicated systems that will have a longer lifespan. They also need to avoid excessive technical debt and ensure their systems are secure and performant. This takes time and effort. Often the architecture of the system that the team works on becomes a blocker to further improvements to delivery pace. For example, an over complicated monolithic architecture in a rapidly scaling company can rapidly slow down the teams who work within it, as changes cannot be made in isolation, or excessive dependencies between parts of the system mean a high level of cross-team coordination is required.
Organisational design plays a big part in reducing delivery pace from slowing too much, and limiting the scope of what folks need to care about is key. If each team member needs to sustain too many communication paths or relationships, then this can cause problems.
For example, at Bloom & Wild, our teams are cross-functional and when they get to around eight people then we split them into new teams. We believe that eight is a good balance between having a critical mass of people with the skills a team needs to be successful and not having so many that the team feels slow. When a team feels slow then you’ll see it; daily standups get boring, engagement drops and keeping communication together is hard, for example.
We also bound the scope of what the team needs to care about; teams own a product or a slice of the customer journey, both from a product perspective and also within a technology domain. We want them to be long lived and operate as independently as possible; after all, why would someone who works on the product page on the website or mobile app really care about how printing in a fulfilment centre works?
We split our customer journey into five domains, from discovery of our brand through to receipt of a delivery. This includes all the steps from procuring the stems for a bouquet, through to customer account management, and everything in between.
Our engineering team is split into two distinct areas, with squads mapping to these domains, and increasingly also to distinct and separated applications and technology platforms. The e-commerce group looks after our core e-commerce domain and platform, from marketing technology through to purchase and customer management, across both web and mobile applications. Our Operations Technology group owns our procurement, product management and fulfilment platforms used by our internal customers and third party partners.
The Engineering team is supported by Platform Operations, who own our underlying infrastructure, IT who own our IT estate and our Data Science team who are closely partnered with Engineering on machine learning based-solutions and powering the business intelligence system used across the company.
Our main inspiration for this setup was the excellent book Team Topologies, and the model above maps closely to the concepts of stream-aligned and platform teams. In Team Topologies, a stream-aligned team is a team aligned to the main flow of business change, with a mix of cross-functional skills and the ability to deliver significant increments without waiting on another team. A platform team works on the underlying platform and supports the stream-aligned teams with delivery. The overriding intention is that the platform simplifies what would otherwise be complex technology and reduces cognitive load for the teams that use it. I’d recommend the review of the Team Topologies book on InfoQ as a starting point in order to learn more.
We’ve also focused on alignment across our Product and Technology team. We’ve found Objective and key results and key performance indicators have worked well for us, to both align teams on what’s important through OKRs, and then to be able to track and course correct through the cycle with KPIs. We theme our KPIs around the phrase “Technology and teams for fast and safe change at scale” and we’ve found branding them has really helped them land within and outside the Technology team.
We’ve moved to a quarterly planning cycle across the whole company, which has worked really well to ensure that all teams are aligned on what’s important and to ensure that there is a clear cadence to delivery across the organisation. Within this framework we have a clear mechanism for prioritising work which is understood and aligned with stakeholders, so if we cannot prioritise work for a stakeholder then it’s clear why that is, and also it’s clear when the next priorisation window will open.
Alignment between teams and their stakeholders
Focusing on a regular, quarterly planning cycle to set clear expectations is the most important alignment point. Our Product team does a great job aligning our stakeholders and by using a top down approach through OKRs then we can ensure that expectations from stakeholders are aligned with the company strategy and our capability to deliver.
Where I’ve seen more issues with alignment is when technical debt or strategic technical work needs to be prioritised and it’s not clear what the value of that work is. We work with our teams to ensure that they are able to effectively explain the value of technical work in a way that both stakeholders and those in our Product team can understand. I like to explain that people should be able to “talk like C suite” when lobbying for work to be done; it’s important to quantify the value of all work, whether it’s directly commercially driven or contributes to the technology strategy. This means understanding what’s important to a company and how a team contributes. It means understanding what people really care about, which frequently is not the state of the codebase or whether the latest front end framework is being used, it’s about translating a message into enablement of business outcomes and return on investment, which is what stakeholders do care about.
And like a lot of things, it’s about telling a story. We find using KPIs as a framing for the current state helps. One of our KPIs is cycle time for example, the time from when work is started to when value is realised in production. Framing a conversation on how a piece of work will improve ongoing cycle time or improve team efficiency can really engage stakeholders when it becomes clear that the improvements will apply to everything the team delivers going forwards. Making the stakeholder the hero of the story, the person who realises the most value, also really helps!
Strengthen Communication within Teams
The first key point for me is to ensure that expectations are clear. I find that co-creating a team charter really helps teams to feel engaged in setting what’s normal for the team and establishing the “rules of the game” if you’re a member of that team.
A team charter should be produced by the team, owned by the team, and be visible to not only the team, but also all those who work with them. It defines who they are and how they like to work. It’s also something that should be revisited whenever people join or leave a team and the team dynamic changes. I’ve written more about ways in which you can define and use team charters, worth reading if you’d like to know more.
As part of our quarterly planning cycle, we also run team health checks within each team, because while pace of delivery is important, that pace needs to be sustainable and we don’t forget that teams are people too. We use a version of the Squad Health Check model made famous by Henrik Kniberg while at Spotify, where we ask the team a number of questions about subjects such as team, organisational and codebase health. Our questions are similar to those mentioned in Henrik’s article, with some small adaptations for Bloom & Wild context, so I’d recommend using the Spotify questions as a starting point if you’d like to explore health checks, before making any changes. What’s really important is to use the same questions in each team as this allows the engineering manager who runs the team to work with the team on specific action points while also allowing the Technology leadership team to look for trends across teams, where actions could have the greatest impact. We find that health checks are a great tool for raising self awareness in teams and for focusing coaching effort. So much so in fact, that we use them in our teams, our communities and our senior Technology Leadership Team at the end of each quarterly planning and delivery cycle.
It’s also important to strengthen communication between teams since the organisation as a whole is only as fast as the slowest team. It’s critical to put forums in place that allow teams to communicate with each other. I think of it like glue; you need to put that glue in place in order to stick things together! We glue our teams together with a set of principles and practices for both our squads and also our communities of practice.
Communities of practice are forums of like-minded folk (for example all our Ruby engineers) who meet regularly and drive the short and long term technical strategies for their codebases. They are also places to get support, guidance and advice. They’re run by our tech leads, and communities are glued together by our Tech Leads forum where all the tech leads support each other. We find this setup allows teams the independence that they need in order to deliver at a sustainable pace while ensuring that the engineers within those teams get the support that they need.
What are the Benefits of Focusing on a Sustainable Pace of Delivery?
We’ve found that Bloom & Wild has been able to become more focused, and through the alignment activities within Technology our teams are able to be more focused too. Our quarterly cycle is a key enabler here and our KPIs help us to stay on track or course correct through the cycle.
By limiting the scope of what our team members need to care about, through designing our organisation to bound the team’s product and technology context, we’ve been able to limit the cognitive load on our team members. This means that they can focus on what’s important, designing and delivering great experiences for our customers, both inside and outside the company.
My Team is Slowing Down. What Should I Do?
Hopefully your OKRs and KPIs will show you this, but often it surfaces first from stakeholders. If a stakeholder is saying “Why is Technology slower than before?” then firstly ask yourself whether they are right or not. It may be that it does take longer to deliver work and you need to tell a story about why that is. In a company that has rapidly scaled, then it’s likely that your stakeholders are also experiencing that in their teams too, so look for ways in which you can draw parallels with their experiences. Focus on understanding and aligning on the return on investment for work; stakeholders frequently do not understand everything that’s required in order to design and develop software, and sharing more here can really help. Sometimes explaining how much it will cost to deliver a feature and the reasons why can mean that the feature is subsequently not prioritised at all!
Don’t assume it’s a stakeholder problem. Look to see if there is somewhere that can be optimised in your team. Having KPIs and historical data really helps here; it enables you to proactively identify where there are issues and change course. It helps you to form a narrative and tell a story to your team about a need for change and to tell a story to your stakeholders on why that change is valuable.
Overall it’s all about ensuring that you can understand the sustainable pace of delivery within your team and use this to prioritise the more impactful work with them. If you can do that then you’ll hopefully find that stakeholders aren’t asking the question “Why is everything so slow?”