I recently spent time with an Engineering Manager at an organisation that seems to want to measure all the wrong things. They find themselves in constant battles to commit to dates when (usually poorly-defined) work will be complete.
I’ve said it before, and I’ll say it again. If the majority of interactions between Product and Engineering are focused on “when” rather than “why”, “what” or “how”, it’s a clear sign of a low-trust environment. Google's Project Aristotle demonstrated that psychological safety is the number one characteristic of a high-performing team. Team members must feel safe to take risks and be vulnerable. You need to solve for trust before you can hope for improvements.
There’s nothing wrong with looking to manage customer’s expectations around when work will be delivered. However, prizing predictability over productivity dooms an effort and an organisation to failure. This is particularly true when success is ill-defined, or requirements keep changing. Perhaps the most egregious recent public failure of date-driven development of recent times was the initial launch of healthcare.gov alongside the Affordable Care Act in the US.
The most important measure that can be applied to any software development effort is whether it is achieving its intended outcome.
When Elon Musk took over Twitter in 2022, he seemed to be fixated on developer activity. Since the takeover, Twitter has become a byword for how not to manage strong engineering teams.
Measuring outcomes takes time, focus and patience. These lagging metrics are vital for the team to learn how to solve for customer needs more effectively. You can also use leading metrics to identify and measure high-performing teams. Leading metrics will help you understand that teams are doing the right thing. Lagging metrics will assure you they’re solving the right problems.
People have been trying to find a successful way of measuring developer productivity for as long as people have been developing software. The first set of metrics that showed strong correlation between team performance and team behaviours were the DORA metrics published in 2014 by the team at DevOps Research and Assessment, made up of Dr. Nicole Forsgren, Jez Humble, and Gene Kim. The group publishes the State of DevOps report each year.
The DORA metrics have four core measures of developer productivity, all focused on quantitative metrics
The first two of these measures help give insights into the team’s efficiency. The latter two offer insights into the stability of its software.
In 2021, the DORA team extended DORA to look at Reliability as a standalone measure. Before this, reliability was assessed using MTTR and Change Failure Rate. Reliability is now defined separately as a measure for how well the software meets user expectations, using availability and performance as proxy measures.
To achieve a short delivery lead time and a high level of deployment frequency, high-performing teams divide their work into thin slices, delivering small incremental code changes frequently. By working this way, they manage deployment risk effectively. They have few, if any, outages, and can recover quickly in the case of a production issue.
Nothing will make your organisation faster than reducing batch size.
Delivery lead time can be defined in different ways. Some organisations use it to measure the time elapsed between a customer request and a change in production. Others measure from when a commit hits a production system to when that change gets deployed to a customer. Still others measure from when a commit is made in a lower, development environment to when it is released to production.
There are pros and cons to each of these approaches. I would say that the most appropriate one varies by organisation and by scenario rather than there being a hard and fast rule.
In an environment with a high level of automation test coverage and full CI/CD, measuring the time from code commit in a lower environment to production release may be beneficial. In less mature environments, it may be valuable to step back and look at the entire value stream end-to-end, thus measuring the time from idea to adoption. Value stream mapping is a fantastic exercise for finding and fixing wrinkles in the ideation, development and deployment processes of an organisation. It is strongly recommended if you lack insights into parts of the development process, and want to surface issues that prevent rapid deployment of software.
Dr. Forsgren was also involved in the creation of the SPACE framework in 2021. SPACE expands the lens through which organisations can measure developer productivity, using more qualitative metrics to calculate the health of their software development practices. These qualitative metrics augment the quantitative focus of DORA metrics. They include developer perception of teamwork, productivity and job satisfaction.
SPACE measures five dimensions of Developer Productivity:
In May 2023, the team behind SPACE returned with an updated view of what drives developer productivity. Their latest research suggests using a developer-centric approach focused on developer experience (DevEx). They recommend homing in on the lived experience of developers and removing the frictions or barriers to flow experienced by engineers.
As with SPACE, the framework looks beyond quantitative measures and tools to inspect issues that impact developers, such as psychological safety and having clear goals. Companies with high-quality work environments are more productive than those with poor DevEx. They recommend using survey data to measure developer perception and experience in three key areas:
Developer productivity is key to successful software development. The journey from the pure quantitative metrics of DORA to the developer-centric measures of DevEx has happened quickly. The range of measures and the introduction of experiential measures underlines that developer productivity is an outcome of organisational culture. It is not something that can be enforced by trying to manage outputs to (often spurious) deadlines.
Aligning the objective criteria of the DORA metrics with qualitative data is most likely to produce positive results. Survey data are still among the best tools available for gathering insight into the employee experience. The DORA metrics answer most organisation’s core need to have some benchmarkable data around developer productivity. By also addressing the subjective elements of the employee/developer experience, organisations will give themselves the best chance to deliver a truly astounding customer experience.