Should We Measure the Performance of an Engineering Team?
The subject may seem tricky as it draws two major topics:
- measuring instruments, what to measure, how and why?
- the management method
And the least we can say is that on these two themes, there is something to write about.
Absurd tracking metrics have long been a running gag in IT. And above all, they had very diverse objectives that are sometimes very harmful to the quality of projects, collaboration or the management of individuals.
However, it is a valuable source of information and on which we can build the levers for the continuous improvement of our practices.
In short, I offer you a small overview on the subject with, in dotted line, the philosophy that we are trying to put in place at Malt around the Developer Experience .
What metrics?
To say the least, there is a plethora of approaches on the subject. Recently I would say that we find the following trends:
- performance measured via OKRs.
If the OKRs are very good for measuring the alignment between the strategy of the company and the product strategy, they do not however make it possible to deeply measure the quality / performance of the engineering behind the product.
- performance measured by accelerate metrics :
the accelerate metrics are those taken from the eponymous book and are:
- frequency of deployment
- the cycle time between idea and production (product delivery lead time)
- mean time to repair
- the change failure rate
These metrics have the advantage of measuring the overall quality of the software, i.e a high frequency of deployment indicates that many good practices have been put in place to achieve this result.
They allow us to make an observation. I am very convinced by these metrics to create a general measure of the quality of engineering. But I cannot dive into detail to find improvements to be made.
- code analysis tools
These tools allow you to dive into analyzes of code quality, duplication, design (coupling, complexity), security, etc.
On the other hand, they have the default of being sometimes complex to understand when setting them up on an existing project because they create a lot of noise. You can quickly get lost and not know what is and what is not important to watch.
- more old school metrics: velocity points, number of lines of code
These metrics are quite questionable, not very representative of quality, not comparable from one team to another and by experience, often misused in terms of management philosophy. They can give the illusion of measuring productivity, unfortunately this is not the case.
At Malt we have used a lot of things in the past, for example:
- code analysis tools (sonar),
unfortunately I think it is not being used to its full potential. The sonar builds are too long and are therefore played out of the standard build. There is a lot of noise and therefore a lack of overall monitoring.
- uptime tracking tools like statuscake.
even if we can consider that the use was more related to monitoring than performance measurement, we use this uptime metric with our customers
- follow-up on the number of bug and support tickets
This was mainly used at specific times to factualize a feeling but we do not follow it on a regular basis.
Good, but with the OKRs that allow alignment with business, the old school metrics that want to measure productivity or the accelerate metrics that look at quality, we note that the objectives differ. So let’s take a look at the value of measuring performance.
Why measure performance?
Just above, I mentioned metrics and their associated goals.
We can measure the overall quality of dev practices (via the metrics listed in accelerate), the contribution to the overall strategy (via the OKRs) and some try to measure productivity with old school metrics.
I have heard some very good Software Engineers saying that frequency of deployment is the only indicator we should be tracking.
IMHO, it can’t go without business metrics, I can very well deliver things that are irrelevant to business very frequently. Second, if the objective is to allow the team to improve, that is insufficient (apart from saying that we can do better).
My opinion is that there is complementarity in these tools. It is not contradictory on the one hand to measure the quality of the code to improve it, to align with business success and to monitor the overall quality of the team.
Until now, I have been talking about measuring engineering performance, and therefore collective performance. What if I talked about a subject a little more tricky : the measurement of individual performance.
Individual performance measurement
Yes, what could be more natural than looking for individual measuring tools? If we look at the sales or marketing functions, it is common to have part of the compensation indexed to performance (sales volume, number of calls, traffic generated, etc.).
Naturally in a company, when a team exceeds a certain size, we will look for a reliable, fair, objective, consistent metrics in time and space to measure. What could be better than quantitative for that? The intention is good.
It should be understood that for many, the performance of a dev is quite mysterious. It is not always the most visible who are the most efficient, so you have to rely on a manager with a risk of lack of objectivity or a risk of inconsistency between different managers.
And this is where we find a certain paradox, traditionally the dev population is quite reluctant to have individual performance measures even though it is also a population that is easily motivated by gamification (i.e. it’s my case).
But misuse in many structures has undoubtedly tense the subject a bit.
Once bitten twice shy, as the old saying goes.
And one of these uses that can cringe is compensation.
Before going any further, I would like to point out that for my part, in general, I am not in favor of a variable part in Engineering compensation. At least not before the staff engineering positions and more but that’s not the subject of this post so I won’t elaborate on it.
On the other hand, I have several subjects that I actually want to resolve: the measurement of expertise, progression in the career path, consistency between individuals, and objectivity.
And for that I want to rely on quantitative and qualitative data.
I don’t take quantitative measurement alone, as individual quantitative performance metrics can have a negative impact. Indeed, setting a measure is the best way to bias a complex system, especially when you put incentives behind it.
When a measure becomes a target, it ceases to be a good measure.
It is sometimes possible to artificially raise a metric and this does not necessarily go in the direction of the quality of the project.
Encouraging individual performance for an activity which is by nature collaborative also means taking the risk of breaking this so-called collaboration, for example by reducing pair programming sessions, design sessions, etc.
It is also ideal enough to create tensions, because some metrics depends on the context (project poorly framed upstream with impact on productivity, time spent on complex bugs because of legacy etc …)
Taking only quantitative measures is the last solution for a manager who does not know the job and therefore has no other means than to use it only.
Consequently, I will use data and feedback. This involves regular discussion (1: 1) and feedback collected from the rest of the team. In an organization at scale, it will be the role of the Engineering Manager to consolidate this type of information.
And from all this information, we can have an “expert” use of data. That is to say that it is from those feedback that we will decide to validate or invalidate assumptions from the data. The data is only used for ad hoc analyzes.
Relying on the engineering manager alone involves risks of subjectivity or lack of consistency in the organization. Relying on data alone carries the risk of not knowing how to interpret it and put it into context. You have to mix the two.
Metrics for continuous improvement
The other way to use these metrics is with a view to continuous improvement of collective performance.
And this is interesting since we will be able to act on more individuals, and larger time scales, in short, more data.
The law of large numbers will make it possible to extract trends.
At Malt we are gradually creating an internal team focused on what we call “Developer Experience”.
The Developer experience at Malt is the team that improve the overall productivity by streamlining the development experience for all developers in the tribes.
This team is about a whole bunch of subjects like conception, design system, software factory etc …
I digress for a moment to explain our organization.
We have 3 Tribes. A Tribe corresponds to multidisciplinary teams of PM, Data, Dev, Design organized around an objective, a product line.
The 3 tribes are:
- Freelance: everything related to the Freelancer experience on Malt
- Company: everything related to the experience of companies on Malt
- Platform: shared services, security, developer experience, cloudops
Platform is therefore a tribe whose clients are the other tribes.
If you are not familiar with the notion of Platform team, I invite you to read the Thoughtworks radar which gives a definition .
Developer experience for us is therefore a team within the Platform team which is actually for the moment a mix between cloudops and shared services. It is this team that is working on our new CI, for example .
These subjects relate for example to the reception of new arrivals (onboarding), documentation (living documentation, ADRs etc…), the decoupling and application independence of the tribes, the CI, the design system etc…
An important point, we try to measure the impact . And for this we pay particular attention to the Data from jira, git, or the CI.
Example: we want to improve onboarding so that they can take over the Malt stack more easily. We set a success metric before starting: the time to first commit in production.
Without metrics, in a tech team, we can easily find ourselves in situations where ultimately we have mostly done hype driven development.
In a Platform team, and here more particularly Developer Experience, we try to work as a product team by considering the other teams as the users of our product. We work with discovery (user interviews, proto, quanti, quali etc…) and we measure the result.
Some examples :
- We control the functional breakdown of our modules by measuring the activity by team on these modules (a module must in principle have a single color, so there are still modules to work on)
- We worked on a code refactoring aimed at reducing the editing of certain “legacy” modules. We now monitor that this refactoring has worked by measuring the legacy edit ratio per week
- we test the stability of common modules by measuring the activity (nb commits)
- measurement of PR review time
- we control the lifespan of a branch.
and many others…
Each analysis aims to understand the precise impact of our changes. It can be on the stability of the code, its decoupling, its performance etc …
In the somewhat fun fact analyzes, here is:
- my own activity graph since the creation of Malt:
(yes, I code less and less)
And the rest?
To conclude, I would say that the use of metrics to measure the performance of teams allows a real implementation of a policy of continuous improvement because they make it possible to measure the impact.
In a smaller team we can legitimately wonder if it is so useful, the volume of data is lower and the size of the team makes it easier to identify blockages. However, it should be noted that there are some pretty damn good tools to start (list below in the sources).
Then, once you exceed a certain size, the scaling of an engineering team requires special attention to avoid gradually killing your efficiency and this requires a strong implementation of continuous improvement.
Annex :
Metrics tracking tools:
(I haven’t used them, so I invite you to make your own ideas)
https://www.usehaystack.io/ (Accelerate metrics)