Stop measuring the wrong things
How did encouraging engineers to competitively write more code lead to a product that produced no value for the company? I remember a director trying to put in place a scoreboard based on the number of lines committed to reward those at the top - except the product never even made it to production.
It can be genuinely hard to measure how well any particular engineer is doing, so managers often reach for simple to access metrics: number of commits, lines of code, pull requests (PRs) merged, tickets complete. The problem is that these indicators actively reward the wrong behaviours. In a field where communication, collaboration, and long term thinking are what produce the most value, proxy metrics distort the view of what your best engineers are contributing.
In this post, we'll look at why these metrics are so pervasive, harmful, and what leaders should measure instead.
Contents
The metrics that feel like answers
I’ve already mentioned the usual suspects, but here’s a few in more detail:
- Lines of Code added. How many lines of code has the engineer committed into the codebase. Does more code mean more value?
- Number of commits produced. Instead of counting individual lines, counting the number of commits made into the repo. If a developer is committing more than another, are they adding more value?
- PRs merged. One step back from the commit: not a line or a single change set, but a full pull request. Developers who raise more PRs must do more work?
- Tickets completed. Entirely separate from the code. How many tickets has the developer picked up and moved through their process (hopefully deployed to production, but that’s another post for another time!)?
- Story points closed. If your team measures velocity in points, you could compare points per person. Sam does 10 points per sprint, Dave does 13 — so Dave is more productive than Sam.
All of these feel appealing. They're visible, quantitative, and easy to build an objective around, for example, "improve your average story points delivered per sprint from 15 to 20 next quarter."
The problem is that these metrics are easy to game.
When Elon Musk acquired Twitter in 2022, he reportedly used lines of code as a key metric to decide which engineers to let go, with nearly half the company eventually cut. The engineers who had written the least code, regardless of what that code actually did, were first in the firing line.
Smart engineers who want to keep their jobs, or rise through the ranks will game them. Not out of dishonesty, but because they have been told that’s what success looks like.
Goodhart's law is already running the team
Goodhart’s law, first articulated in 1975, states: "When a measure becomes a target, it ceases to be a good measure." Once a metric becomes the goal, people optimise for the metric rather than the underlying outcome it was meant to represent.
Here's how that plays out with each metric in practice:
- Lines of Code added. One of the easiest to game. An engineer can pad their code with extra lines, prefer building over buying, and duplicate rather than refactor. The codebase becomes larger, more verbose, less performant, and more prone to bugs.
- Number of commits produced. Better than counting lines, but still gameable. Small commits are good engineering practice, but an engineer chasing a score will commit even more granularly than necessary. More dangerously, this discourages pairing. If you want the commit attributed to you, you won't invite someone else to work alongside you.
- PRs merged. Similar to the above: an engineer creates unnecessarily granular pull requests. If your team requires review before merging, this multiplies handoffs and context switching, slowing everything down.
- Tickets completed. This can encourage breaking work into smaller units, which is genuinely useful, but it also pushes developers towards easier, lower-risk work they can complete faster. A dozen one-line fixes look better on a dashboard than one complex, high-value feature.
- Story points closed. Points are unique within a team, so comparing across teams is apples and oranges already. Within a single team, you end up with one of two outcomes: everyone agrees to point everything higher, or individuals get competitive and avoid the riskier, more valuable tickets.
When I last reported directly to a CEO, I fell into this trap. I started exposing more metrics to give him a clearer picture of what the team was doing. The intention was good, but I began reporting on story points delivered per sprint, which shifted the team's focus away from the most valuable tasks. The team gravitated towards familiar work, because it was easier to bank points on. At best, it placated the CEO. At worst, we stopped delivering what the company actually needed.
As Bill Gates once said: "Measuring programming progress by lines of code is like measuring aircraft building progress by weight." When you measure individual output with proxy metrics, you don't just misread performance. You actively discourage collaboration, craftsmanship, and risk-taking.
The invisible work that matters
Let's challenge the assumption that visible output equals engineer value. Who is doing the valuable work that won't show up on any dashboard?
- It's the senior engineer who spends a day with a junior colleague, accelerating their next six months.
- It's the architect who writes a one-pager that saves the team from a costly wrong turn.
- It's the engineer who quietly fixes a one-line bug that was silently corrupting data for weeks.
- It's the team member who advocates for deleting 10,000 lines of legacy code that everyone else was afraid to touch.
Teams usually know who these people are.
Some of the highest-performing teams I've worked in were built on pairing, knowledge sharing, and genuine psychological safety, none of which shows up in a commit log. One of the best engineers I've worked with spent half her day sitting with junior members of the team, because she knew it would exponentially improve the rate of delivery. She was a credit to everyone around her, and none of the metrics above would have captured that.
Your best engineers may be the ones producing the fewest individual metrics, because they're multiplying the output of everyone around them. A metric that can't see this will rank them as underperformers.
Why individual metrics break team sports
Software engineering is a team sport. In football, if every player were measured solely on goals scored, you'd be sacking your goalkeeper after every match. Even giving each position its own metric won't help, because the best decision at any given moment is often the one that benefits the team, not the individual's stat line.
Individual metrics make collaboration feel costly. If I pair with you, you get the commit. If I review your code carefully and push back, we slow down velocity. If I focus on documentation, I don't ship features.
And we do it to ourselves too. I've certainly been guilty of chasing the "number go up" feeling, finishing one more ticket when I should be helping someone who's stuck, just to have more in the done column. I've hurt the team for my own individual metric.
Engineering is collective. The unit of performance that matters is the team, not the individual contributor's GitHub graph.
What proof of work looks like
At the team level, we already know how to measure success. It's right there in the Agile Manifesto: "Working software is the primary measure of progress".
But the question most leaders arrive here with is: "If I can't track individual metrics, how do I know anyone is doing anything?" That's a fair question. The answer isn't more granular activity tracking. It's better conversations and clearer outcomes.
Good indicators of individual contribution include:
- Can the engineer discuss what they're working on and why it matters to the customer?
- Can their teammates describe how they've made the team faster or better?
- Are they growing their ability to tackle harder problems?
- Are customers benefiting from their work, directly or indirectly?
I’ve used CircleCI’s competency framework with several teams and received strong feedback. It covers 27 indicators across five areas, and it's striking how few relate directly to writing code. When I sat down with engineers and worked through giving evidence and having honest conversations about their performance, it became clear to everyone just how much they were genuinely contributing.
Proof of work isn't a commit log. It's a story of value: one the engineer can tell, and their colleagues can confirm.
What to measure instead
Reframe the question. Stop asking "how do I measure this engineer?" and start asking: "how do I know this team is healthy and creating value?" That's what you actually want to know.
Here are the signals worth paying attention to:
- Outcome based delivery. Are teams consistently achieving the goals they've committed to? Are customers experiencing the value?
- Learning velocity. How quickly does the team recover from mistakes? Do they learn, adapt, and hold each other accountable?
- Customer engagement. Are engineers actively talking to users? Do they understand the problems they're trying to solve, both qualitatively and quantitatively?
- System stability. Is what they're building reliable? How frequent are incidents, how quickly do they know about them, and how fast do they recover?
- Team health. Do the team enjoy working together? Are they energised? Is there enough psychological safety to share openly and improve together?
These signals take more effort to surface than a commit count. I'd encourage every leader to look at their own goals and those of their reports, and ask: are we measuring what matters, or just what's easy to measure?
At the end of the day, the engineers who create the most value often leave the smallest fingerprint in your dashboards. Your job as a leader isn't to count the fingerprints; it's to build the conditions where great work can happen, and to recognise it when it does.
