practices of continuous integration, continuous delivery, dev ops, and
agile software methodologies. This article serves to define what
metrics-driven development is, why it is useful, and how to use it to
drive software changes.Let’s start with a definition of metrics-driven
development.
- Metrics-Driven Development (MDD)
- The use of real-time metrics to drive rapid, precise,
and granular software iterations.
This definition is simple and straightforward, but does leave room for
interpretation. Let’s dive deeper and break the definition down,
bit-by-bit.
- Real-time
- To be effective, metrics must be viewable by developers and
operations staff in close to real-time. Why? Real-time metrics provide an
immediate view of the effect of software changes to production systems
— and understanding the effects of software changes in production is one
of the key benefits for employing metrics-driven development. - Rapid
- Changes to production software can be made rapidly to affect changes in
one or more metrics. Combining rapid deployment with real-time metrics
provides a powerful force for iterating production software towards
performance and stability goals. - Precise
- Changes to production software can precisely change a given metric in
a target direction. By being able to make precise changes to a metric,
the development team can focus on targeting a particular metric of
interest with each software change. - Granular
- Changes to production software can target metrics at a granular level.
Individual development teams should be able to deploy changes to production
software that target individual metrics.
This definition and its individual components emphasizes the need for
combining real-time metric collection and reporting with the ability to
make small, rapid software changes. These capabilities provide two
benefits. First, they allow you to make software development decisions
based on real-world production data. Second, they provide a means of
affecting measurably beneficial changes to the software with each
deployment. Together, metrics-driven development helps developers and
businesses make better decisions by including metrics as an integral part
of the development process.
Prerequisites
MDD is a fundamentally iterative process. Although the principles and
practices outlined in this article can be applied directly, they are
especially powerful when used with the enabling technologies described in
this section.
Taken as a whole, these prerequisites allow developers to quickly and
safely deploy changes to production and control the set of users exposed
to software changes. In this environment, MDD allows you to use metric
data to drive each individual software iteration.
Metrics architecture
Foremost, you need an architecture for collecting metrics from running
application data and transmitting it to a data collection point. You also
need a user interface for querying and visualizing data.
In practice today, this typically means deploying a data collection
library like Coda Hale’s Metrics with your application, and using
an aggregation system like fluentd to push data to collection
points. At collection points, data is ingested into a time-series
database like graphite or InfluxDB. A user
interface like Grafana is used to visualize metrics and provide
dashboards.
Ultimately, your team or organizations requirements will dictate the
specific technologies used. Providing specific guidance is outside the
scope of this document.
Continuous integration
Continuous integration (CI) is the practice of frequently
integrating changes from multiple members of each team. Each integration
is verified automatically and errors are detected as quickly as possible.
CI makes it possible to easily deploy cohesive working software.
Continuous delivery
Continuous delivery (CD) is the practice of building software
that can be deployed at any time. The priority is in keeping software
working and deployable at all times. This allows teams to ship code to
production at any moment, adding and removing metrics as necessary.
Feature flagging
Feature flagging is a powerful technique allowing teams to
modify system behaviour at runtime without changing code. The toggle can
be turned “on” or “off” to expose users to new functionality. These users
act as a test-bed for new code and by observing the metrics from these new
users, the development team can make better decisions about the code being
released.
A principle is a fundamental truth that serves as the foundation for
a system of belief. What follows are the fundamental truths according to
metrics-driven development. These truths guide the metrics-driven
development process and help to frame the discussion of metrics as they
apply to software development.
Production is unique
The first principle guiding metrics-driven development is that your
production environment is unique. This is necessarily true; you cannot
exactly replicate your production environment for local development,
testing, or staging. You must accept that production is different.
Why is production different? Foremost, the data. The amount and variety of
data in production typically dwarves that of any testing environment.
Also, as is typical in production workloads, some data may have been
changed (either accidentally or intentionally during crisis management)
and that change has not been accurately replicated in any testing
environment; your development process needs to account for this
possibility. Second, the scale. Typically, testing software changes works
by deploying a single instance of your software to a single virtual
machine or container. Then, on production that change is deployed to
multiple virtual machines or containers and interacts with clusters of
other services. The book Release
It!
describes this problem as Unbalanced Capacities and these imbalances in
production typically cannot be replicated locally.
More generally, there will always be edge cases in production data,
hardware, or environment that cannot feasibly be replicated during
testing. Production is unique.
Tests are not enough
Testing is not enough to uncover potential production bugs. You need to do
more than ensure that software changes pass tests, you need to verify that
software changes correctly affect production behaviour. By using metrics
and monitoring your team can accurately verify that a software change is
working as expected.
Note that this does not mean tests are not valuable — they are absolutely
essential for preventing regressions and validating your assumptions. Just
be aware that unit tests can only capture the scenarios that you are
already aware or that surface in QA. Since production is unique, you will
not be able to imagine every possible scenario that should go into your
unit tests.
Your mental model is not complete
In production software systems, there is a gap between perception and
reality. Our perception is the code that we write and how we expect it to
behave, our reality is what happens when that code is actually run on
production. For example, we may have a perception about why a certain
operation is a bottleneck in the credit card processing workflow, but
reality requires profiling and measuring the current workflow to determine
the exact location of the bottleneck.
Coda Hale calls this the “gap” between
perception and reality, cautioning us to “mind the gap”.
Code has no value
Your job is not to write code; your job is to create value. Think about
it. No sane employer will pay a software engineer to write code, print it
out, and frame it to hang on a wall. That same code only has value when it
is running on production and being used by real users.
So what provides business value? A new feature, improving an existing
feature, fixing bugs, improving performance, or reducing cost, to name
a few. All of these things only provide value when the code that
implements them is run, not when they are written. It follows that to
provide the most value to the business, an engineer needs to know as much
as possible about how the code behaves while it is running. Metrics are
typically the only way this is possible.
If you can’t measure it, you can’t manage it (or improve it)
Originally attributed to Dr. Edwards Deming, for managing people and
business processes, the quote “If you can’t measure it, you can’t manage
it” applies equally well to managing software systems. If users start to
complain about your site being “slow”, as an engineer you will need to
have some sort of understanding of what “slow” actually means. This
implies measuring it, so that you can improve it. If you have a metric
tracking the latency of user requests, you can make targeted improvements
to this metric through iterative software changes.
You can’t measure everything
This article is about metrics and metrics-driven development. So
naturally, I am bullish about adding metrics to the software development
process. However, be mindful that quantity of metrics does not equal
quality — you will need to strike the right balance of metrics in your
system.
Unneeded metrics place additional resource constraints on the metrics
pipeline itself, and can make relevant metrics more difficult to locate
and interact with. This typically means purging and deleting metrics that
are no longer valuable to you. Treat metrics curation as requirement of
metrics-driven development.
Practices are the applications of principles stated in a context-dependent
way. In our case, we apply the principles of metrics-driven development to
the task of software development. To that end, we treat measurement and
instrumentation as a software development practice integrated within the
regular software development life cycle and apply the metrics-driven
development principles to that context.
Instrumentation as code
Developers typically have the best mental model of how an application is
meant to behave in production. It therefore makes sense to make
instrumentation an integral part of the software development process.
Given that developers can create targeted instrumentation in the
application code itself, instrumentation becomes a required deliverable
for every new feature or fix. When writing new code, the developer is able
to form a hypothesis about its behavior in production; the measurements
placed in the code are a means for the developer to prove or disprove
their hypothesis.
Single source of truth
Metrics collected during operations should be stored in a common
repository, in a common format, and with a common interface for
visualization, alerting, and analysis. This allows developers or
operations staff to easily correlate metrics between systems and across
all layers of the application stack.
The metrics platform must be timely, comprehensive, and intuitive so that
everyone instinctively relies on it as their preferred resource to reason
about the production environment.
Alert on observations
An effective metrics-driven development process allows for alerts to
trigger based on metric values. This allows developers and operations
staff to effectively target affected systems by honing in on metrics
showing signs of problems. Once isolated, the same set of metrics can
confirm that any response has successfully resolved the issue.
It’s critical that alerts are triggered off of the same dataset used for
visualization since disparate systems introduce the potential confusion
and error. Any lack of certainty during incident response adds additional
stress and increases the likelihood of human error.
Use the scientific method
By deploying a change and measuring its effects, developers and operations
gain confidence that any software change is reliable, performant, and
affects the metric of interest, confirming any hypothesis.
Now, how do we follow the principles and practices outlined here? By using
the Metrics-Driven Development Process using the OODA
loop, devised by John
Boyd.
The phrase OODA loop refers to the decision cycle of observe, orient,
decide, and act, developed by military strategist and United States Air
Force Colonel John Boyd. Boyd applied the concept to the combat operations
process, often at the strategic level in military operations. It is now
also often applied to understand commercial operations and learning
processes. The approach favors agility over raw power in dealing with
human opponents in any endeavor.
The following example of the OODA loop is adapted from Coda Hale’s
Metrics, Metrics Everywhere talk.
Observe
All decisions are based on observations of an evolving situation.
You have a question:
What is the 99% latency of our autocomplete service right now?
You look at current measurements:
~500ms
Orient
During the orientation phase, we examine how an observation relates to our
previous experiences.
You have a question:
How does this compare to other parts of our system, both currently and historically?
You look at historical metrics:
It’s way slower.
Decide
Given the observation and our experience, we can decide on the next action
to take.
You have a question:
Should we make the autocomplete service faster? Or should we add a new
feature?
You now have the knowledge to make an informed decision:
Let’s make it faster.
Act
You’ve made a decision, now act. Write some code, deploy it, and measure
the results.
Repeat the loop.
By using the metrics-driven development process you improve the mental
model of the code so that you can make better decisions. Adopting MDD
allows you to monitor metrics for current problems, aggregate them for
historical perspective, and ultimately use our improved mental model to
generate more business value.