Engineering Metrics That Actually Matter: From DORA to Real Business Outcomes

The Velocity Theater Smell

The engineering team just wrapped their quarterly review. The VP of Engineering is excited.

"Our velocity is up 30% quarter-over-quarter! We're shipping more story points than ever. The team is crushing it!"

Meanwhile, in the actual team:

Lead time from commit to production has doubled from 3 days to 6 days
The on-call engineer got paged 23 times last week
Three senior engineers are interviewing elsewhere
The last two releases had critical bugs that required emergency rollbacks
No one can actually explain what "30% more story points" means in terms of customer value

This is velocity theater. You're measuring activity and calling it progress.

Here's the uncomfortable truth: most engineering metrics are either useless vanity metrics or easily gamed numbers that create the illusion of improvement while the system actually degrades.

Let me show you what to measure instead.

Principles of Good Engineering Metrics

Before we dive into specific metrics, let's establish what makes a metric actually useful.

Good Engineering Metrics Should:

1. Be hard to game without real improvement

Bad metric: Lines of code (just write verbose code)
Good metric: Deployment frequency (hard to fake actually shipping)

2. Be understandable by both engineers and business stakeholders

Bad metric: "Code coverage increased from 67.3% to 71.8%"
Good metric: "We went from 2 production bugs per week to 0.5 production bugs per week"

3. Drive better decisions and focus

Bad metric: Number of tickets closed (encourages splitting work artificially)
Good metric: Lead time for changes (encourages removing bottlenecks)

4. Balance speed and quality

You can't optimize for just one dimension. Fast but broken is bad. Perfect but slow is also bad.

Leading vs Lagging Indicators

Lagging indicators tell you what already happened:

Revenue (last quarter's performance)
Customer churn (customers already left)
Production incidents (problems already occurred)

Leading indicators predict what will happen:

Deployment frequency (shipping faster → learn faster → iterate better)
Code review turnaround time (slow reviews → frustrated engineers → slower delivery)
CI build time (slow builds → engineers stop running tests → quality drops)

You need both, but leading indicators let you steer the ship, while lagging indicators just tell you if you hit the iceberg.

Delivery Metrics: Borrowing from DORA, Simplified

The DORA (DevOps Research and Assessment) metrics are the best-validated engineering metrics we have. Let me translate them into plain language.

1. Deployment Frequency

What it measures:
How often you deploy code to production.

Why it matters:
Higher deployment frequency correlates with better performance, happier teams, and faster learning. If you deploy once a month, you get 12 learning cycles per year. If you deploy daily, you get 260.

How to measure:

Deployment Frequency = Production deploys / Time period

Good targets by stage:

Elite: Multiple deploys per day
High: Once per day to once per week
Medium: Once per week to once per month
Low: Less than once per month (you have a problem)

What good looks like:
You can deploy a one-line bug fix to production in under 30 minutes, safely.

2. Lead Time for Changes

What it measures:
Time from code commit to code running in production.

Why it matters:
This is your feedback loop speed. Long lead time means slow learning, slow fixes, and frustrated engineers.

How to measure:

Lead Time = Time from first commit to production deploy

Good targets:

Elite: Less than 1 hour
High: 1 day to 1 week
Medium: 1 week to 1 month
Low: More than 1 month (your pipeline is broken)

What to watch:
If lead time is increasing, you have accumulating friction: slow CI, manual approvals, deployment fear, or architectural coupling.

3. Change Failure Rate

What it measures:
Percentage of deployments that cause production issues requiring hotfix, rollback, or incident response.

Why it matters:
This keeps your deployment frequency honest. You can deploy 10 times a day, but if 30% fail, you're not moving fast—you're thrashing.

How to measure:

Change Failure Rate = (Failed deploys / Total deploys) × 100%

Good targets:

Elite: 0–15%
High: 16–30%
Medium: 31–45%
Low: More than 45% (you need better testing and staging)

What good looks like:
Most deployments just work. When they don't, you catch issues in staging or can roll back quickly.

4. Mean Time to Recovery (MTTR)

What it measures:
How long it takes to restore service when a production incident occurs.

Why it matters:
You will have incidents. What matters is how fast you recover. MTTR measures your operational maturity.

How to measure:

MTTR = Total incident resolution time / Number of incidents

Good targets:

Elite: Less than 1 hour
High: Less than 1 day
Medium: 1 day to 1 week
Low: More than 1 week

What good looks like:
Clear runbooks, good monitoring, practiced incident response, ability to roll back or forward quickly.

Why DORA Beats Story Points

Story points measure estimation accuracy, not delivery speed or quality. They're a planning tool, not a performance metric.

DORA metrics measure actual throughput, quality, and resilience—things that matter to the business and the team.

If you're still measuring velocity in story points, stop. Start measuring lead time instead.

Developer Experience Metrics

These metrics reveal how painful or smooth your engineering environment is. Bad developer experience (DevEx) drives attrition and slows everything down.

1. Time to First PR for New Hires

What it measures:
How long it takes a new engineer to go from day 1 to submitting their first meaningful pull request.

Why it matters:
Fast onboarding = good documentation, clear architecture, smooth tooling. Slow onboarding = messy systems and tribal knowledge.

How to measure:

Onboarding Time = Days from start date to first PR merged

Good targets:

Excellent: 1–3 days
Good: 1 week
Warning zone: 2–4 weeks
Crisis zone: More than 1 month

What to watch:
If this is trending up, your codebase is getting harder to understand or your tooling is degrading.

2. Average CI Build Time

What it measures:
How long engineers wait for tests to run in CI.

Why it matters:
Slow CI kills productivity. If tests take 40 minutes, engineers stop running them locally, batch changes, and context-switch constantly.

How to measure:

CI Build Time = Average time from commit to CI pass/fail

Good targets:

Fast: <10 minutes
Acceptable: 10–20 minutes
Slow: 20–40 minutes
Broken: >40 minutes (engineers will work around this)

What to track:
Graph this over time. If it's trending up, invest in parallelizing tests, pruning slow tests, or upgrading CI infrastructure.

3. Code Review Turnaround Time

What it measures:
Time from PR submission to approval/merge.

Why it matters:
Long review times create work-in-progress queues, context switching, and merge conflicts. Engineers start avoiding small PRs because they rot in the queue.

How to measure:

Review Time = Time from PR creation to approval

Good targets:

Fast: <4 hours
Acceptable: Same day
Slow: 1–3 days
Broken: >3 days (engineers will stop breaking up work into small PRs)

What to watch:
If review time is high, look at team size, async communication norms, or whether senior engineers are overloaded.

Quality & Reliability Metrics

Fast but unreliable is not success. These metrics keep quality honest.

1. Incident Frequency and MTTR

We covered MTTR in DORA. Also track:

Incident frequency:

Incidents per week/month

Target:
Trending downward over time. Each incident should lead to improvements that prevent similar incidents.

What to watch:
Flat or increasing incident rate means you're not learning from postmortems.

2. Bug Escape Rate

What it measures:
Percentage of bugs found in production vs. bugs found in pre-production (dev, staging, QA).

Why it matters:
This reveals how good your testing and staging environments are.

How to measure:

Bug Escape Rate = (Prod bugs / Total bugs) × 100%

Good targets:

Excellent: <10% (most bugs caught before prod)
Good: 10–25%
Warning: 25–50%
Crisis: >50% (your staging environment doesn't mirror prod)

What to watch:
If this is increasing, your testing strategy or staging environment is degrading.

3. Error Budgets (for Critical Services)

What it measures:
Allowed downtime or error rate before you stop shipping features and focus on reliability.

Why it matters:
Balances speed and stability. Gives teams permission to move fast until the budget is consumed, then forces reliability focus.

How to measure:

Error Budget = (100% - SLA target) × Time period

Example: 99.9% SLA = 0.1% error budget = 43 minutes downtime per month

How to use:

Burning error budget fast? Slow down feature work, focus on stability.
Error budget healthy? Ship faster.

What to avoid:
Don't optimize for "zero incidents" if it means you stop shipping. The right balance is fast learning with acceptable reliability.

Choosing Metrics by Company Stage

Don't blindly copy Google's metrics. Your stage, constraints, and problems are different.

Early Startup (5–20 engineers)

Focus: Learning speed, customer impact, survival.

Metrics to track:

Lead time for changes (how fast can we learn from customers?)
Incident impact (customer-facing downtime in hours per week)
Deployment frequency (are we shipping regularly?)

Skip for now:
Complex dashboards, code coverage, velocity tracking. You're optimizing for survival and product-market fit, not process perfection.

Scaleup (20–100 engineers)

Focus: Sustainable growth, operational maturity, team scaling.

Metrics to track:

Full DORA set (deployment frequency, lead time, change failure rate, MTTR)
On-call health (alerts per shift, sleep-hour pages)
Onboarding time (time to first PR)
Cost per deploy or per user (cloud costs matter now)

Why these:
You're scaling the team and systems. Process and reliability matter. You need leading indicators of whether scaling is sustainable.

Enterprise (100+ engineers)

Focus: Cross-team coordination, architectural health, long-term reliability.

Metrics to track:

DORA metrics by team (compare and learn from patterns)
Cross-team lead time (features that require multiple teams)
Architectural health indicators (coupling metrics, module boundaries)
Platform adoption metrics (if you have internal platforms)
Retention and satisfaction (are engineers staying and happy?)

Why these:
Coordination costs dominate. You need to track how well teams work together and whether architecture enables or blocks them.

Using Metrics in Rituals (Without Weaponizing Them)

Metrics are tools for learning, not weapons for punishment.

Weekly Engineering Stand-of-Stands (15 minutes)

What to review:

This week's deployment count and any failures
Current lead time trend
Any incidents and recovery time
Blockers affecting metrics (CI slow, review queues long)

How to use:

Celebrate improvements: "Lead time dropped from 5 days to 3 days—nice work on the CI optimization!"
Identify systemic issues: "Review time spiked to 4 days. Do we need to rebalance workload?"

What NOT to do:

Compare individuals: "Why is your lead time slower than Sarah's?"
Punish: "Your change failure rate is too high. Do better."

Monthly Leadership Reviews (30 minutes)

What to review:

DORA metrics trend over last 3 months
DevEx metrics (onboarding time, CI speed, review time)
Incident trends
Correlate with business outcomes when possible

Questions to ask:

Are we getting faster or slower?
Are we shipping more reliably?
What's blocking us?
What investments would improve these metrics?

Postmortems

What to track:

Incident frequency by type
MTTR trends
Action item completion rate from previous incidents

How to use:

"Our MTTR has dropped from 90 minutes to 30 minutes over the last quarter. Our runbooks and monitoring investments are paying off."
"We've had 3 database incidents in 2 months. We need to prioritize that database migration."

Metric Smells: When Your Dashboards Are Lying to You

Here are the red flags that your metrics are broken:

Smell 1: Velocity Always Up, But Morale and Quality Down

What's happening:
Team is gaming story points by inflating estimates or counting smaller and smaller tasks.

Fix:
Replace velocity with lead time. Hard to fake actually shipping faster.

Smell 2: Metrics No One Can Explain

What's happening:
Someone built a dashboard with 40 KPIs. No one remembers why half of them exist.

Fix:
Delete any metric that you can't explain in one sentence or that hasn't been referenced in 3 months.

Smell 3: Dashboards No One Checks

What's happening:
Beautiful dashboards that no one looks at except in board meetings.

Fix:
If a metric doesn't change behavior or decisions, delete it. Metrics should be tools, not decoration.

Smell 4: Metrics Used to Punish

What's happening:
"Your change failure rate is too high. You're underperforming."

Fix:
Metrics should diagnose systems, not blame individuals. If metrics create fear, they'll get gamed or ignored.

Smell 5: Gaming Is Rewarded

What's happening:
Engineers split one feature into 10 tiny PRs to optimize for "number of PRs merged."

Fix:
If people are gaming metrics, either the metric is bad or the incentives are misaligned. Change the metric or stop tying it to performance reviews.

Smell 6: Metrics Without Context

What's happening:
"Our deployment frequency is 2x per week." Is that good? For a startup, it's slow. For a bank, it's fast.

Fix:
Always provide context: trends over time, comparison to your own past, or industry benchmarks for your stage.

Less Metrics, More Insight

Here's what I've learned after years of building and breaking engineering dashboards:

A handful of well-chosen metrics beats a sea of dashboards.

Most teams would be better off tracking 5–7 core metrics deeply than tracking 40 metrics shallowly.

The Minimal Viable Metric Set

If I could only track 5 metrics for a typical scaleup team, I'd choose:

Lead time for changes (are we shipping faster or slower?)
Change failure rate (are we shipping reliably?)
MTTR (when things break, how fast do we recover?)
Time to first PR for new hires (is our system getting easier or harder to work in?)
Incident frequency trend (are we learning and improving?)

These five tell you:

How fast you're delivering
How reliably you're delivering
How sustainable your pace is
Whether you're getting better over time

Everything else is detail.

Your Metric Audit Checklist

Use this to prune and redesign your team's metrics:

Step 1: List all current metrics

Write down every metric on every dashboard you have

Step 2: Ask hard questions for each metric

Can we explain this metric in one sentence?
Has this metric changed a decision in the last 3 months?
Would we notice if this metric disappeared?
Is this metric gameable without real improvement?
Does this metric drive better behavior?

Step 3: Delete ruthlessly

Delete any metric where you answered "no" to questions 2 or 3
Delete any metric where you answered "yes" to question 4 without "yes" to question 5

Step 4: Choose your core set (5–7 metrics)

Pick 2–3 delivery metrics (DORA subset)
Pick 1–2 quality/reliability metrics
Pick 1–2 developer experience metrics

Step 5: Design your rituals

Weekly review: Quick check of trends and blockers (15 min)
Monthly review: Deep dive on trends and investments (30 min)
Quarterly review: Calibrate metrics and adjust if needed (1 hour)

Step 6: Set clear guidelines

Metrics are for learning, not punishment
Metrics diagnose systems, not individuals
Gaming metrics is a signal the metric is wrong
Celebrate improvements, investigate degradations

Step 7: Build feedback loops

Review metric usefulness quarterly
Delete metrics that stop being useful
Add metrics when you have new questions
Always prefer simple over complex

Stop Measuring Everything, Start Measuring What Matters

Engineering metrics should answer a few critical questions:

Are we shipping faster or slower?
Are we shipping reliably?
Is our system getting easier or harder to work in?
Are we learning and improving?

If your metrics don't answer these questions, or if they create theater instead of insight, delete them.

Build a small, useful set of metrics. Review them regularly. Use them to diagnose and improve systems, not to judge individuals.

And remember: the best metric is the one that changes behavior for the better.

Everything else is just dashboard decoration.

TL;DR

Engineering Metrics That Actually Matter: From DORA to Real Business Outcomes

The Velocity Theater Smell

Principles of Good Engineering Metrics

Good Engineering Metrics Should:

Leading vs Lagging Indicators

Delivery Metrics: Borrowing from DORA, Simplified

1. Deployment Frequency

2. Lead Time for Changes

3. Change Failure Rate

4. Mean Time to Recovery (MTTR)

Why DORA Beats Story Points

Developer Experience Metrics

1. Time to First PR for New Hires

2. Average CI Build Time

3. Code Review Turnaround Time

Quality & Reliability Metrics

1. Incident Frequency and MTTR

2. Bug Escape Rate

3. Error Budgets (for Critical Services)

Choosing Metrics by Company Stage

Early Startup (5–20 engineers)

Scaleup (20–100 engineers)

Enterprise (100+ engineers)

Using Metrics in Rituals (Without Weaponizing Them)

Weekly Engineering Stand-of-Stands (15 minutes)

Monthly Leadership Reviews (30 minutes)

Postmortems

Metric Smells: When Your Dashboards Are Lying to You

Smell 1: Velocity Always Up, But Morale and Quality Down

Smell 2: Metrics No One Can Explain

Smell 3: Dashboards No One Checks

Smell 4: Metrics Used to Punish

Smell 5: Gaming Is Rewarded

Smell 6: Metrics Without Context

Less Metrics, More Insight

The Minimal Viable Metric Set

Your Metric Audit Checklist

Stop Measuring Everything, Start Measuring What Matters

Topics

About Ruchit Suthar

Related Articles

The AI Technical Debt Paradox: Moving Faster While Accumulating Less Debt

Copilot Instructions & Context Files: The Before/After That Changes Everything

AI Pair Programming ROI: The Metrics That Matter (Not Lines of Code)

Stay Updated