Technical Leadership

Engineering Metrics That Actually Matter: From DORA to Real Business Outcomes

Teams celebrate 'velocity up 30%' while bugs and cycle time get worse. Stop the velocity theater. Learn which engineering metrics actually drive better decisions and outcomes, and which are just dashboard decoration.

Ruchit Suthar
Ruchit Suthar
November 14, 202513 min read
Engineering Metrics That Actually Matter: From DORA to Real Business Outcomes

TL;DR

Most engineering metrics are vanity numbers that don't predict success. Focus on deployment frequency, lead time for changes, mean time to recovery, and change failure rate—metrics that are hard to game and directly impact business outcomes. Balance speed with quality.

Engineering Metrics That Actually Matter: From DORA to Real Business Outcomes

The Velocity Theater Smell

The engineering team just wrapped their quarterly review. The VP of Engineering is excited.

"Our velocity is up 30% quarter-over-quarter! We're shipping more story points than ever. The team is crushing it!"

Meanwhile, in the actual team:

  • Lead time from commit to production has doubled from 3 days to 6 days
  • The on-call engineer got paged 23 times last week
  • Three senior engineers are interviewing elsewhere
  • The last two releases had critical bugs that required emergency rollbacks
  • No one can actually explain what "30% more story points" means in terms of customer value

This is velocity theater. You're measuring activity and calling it progress.

Here's the uncomfortable truth: most engineering metrics are either useless vanity metrics or easily gamed numbers that create the illusion of improvement while the system actually degrades.

Let me show you what to measure instead.

Principles of Good Engineering Metrics

Before we dive into specific metrics, let's establish what makes a metric actually useful.

Good Engineering Metrics Should:

1. Be hard to game without real improvement

Bad metric: Lines of code (just write verbose code)
Good metric: Deployment frequency (hard to fake actually shipping)

2. Be understandable by both engineers and business stakeholders

Bad metric: "Code coverage increased from 67.3% to 71.8%"
Good metric: "We went from 2 production bugs per week to 0.5 production bugs per week"

3. Drive better decisions and focus

Bad metric: Number of tickets closed (encourages splitting work artificially)
Good metric: Lead time for changes (encourages removing bottlenecks)

4. Balance speed and quality

You can't optimize for just one dimension. Fast but broken is bad. Perfect but slow is also bad.

Leading vs Lagging Indicators

Lagging indicators tell you what already happened:

  • Revenue (last quarter's performance)
  • Customer churn (customers already left)
  • Production incidents (problems already occurred)

Leading indicators predict what will happen:

  • Deployment frequency (shipping faster → learn faster → iterate better)
  • Code review turnaround time (slow reviews → frustrated engineers → slower delivery)
  • CI build time (slow builds → engineers stop running tests → quality drops)

You need both, but leading indicators let you steer the ship, while lagging indicators just tell you if you hit the iceberg.

Delivery Metrics: Borrowing from DORA, Simplified

The DORA (DevOps Research and Assessment) metrics are the best-validated engineering metrics we have. Let me translate them into plain language.

1. Deployment Frequency

What it measures:
How often you deploy code to production.

Why it matters:
Higher deployment frequency correlates with better performance, happier teams, and faster learning. If you deploy once a month, you get 12 learning cycles per year. If you deploy daily, you get 260.

How to measure:

Deployment Frequency = Production deploys / Time period

Good targets by stage:

  • Elite: Multiple deploys per day
  • High: Once per day to once per week
  • Medium: Once per week to once per month
  • Low: Less than once per month (you have a problem)

What good looks like:
You can deploy a one-line bug fix to production in under 30 minutes, safely.

2. Lead Time for Changes

What it measures:
Time from code commit to code running in production.

Why it matters:
This is your feedback loop speed. Long lead time means slow learning, slow fixes, and frustrated engineers.

How to measure:

Lead Time = Time from first commit to production deploy

Good targets:

  • Elite: Less than 1 hour
  • High: 1 day to 1 week
  • Medium: 1 week to 1 month
  • Low: More than 1 month (your pipeline is broken)

What to watch:
If lead time is increasing, you have accumulating friction: slow CI, manual approvals, deployment fear, or architectural coupling.

3. Change Failure Rate

What it measures:
Percentage of deployments that cause production issues requiring hotfix, rollback, or incident response.

Why it matters:
This keeps your deployment frequency honest. You can deploy 10 times a day, but if 30% fail, you're not moving fast—you're thrashing.

How to measure:

Change Failure Rate = (Failed deploys / Total deploys) × 100%

Good targets:

  • Elite: 0–15%
  • High: 16–30%
  • Medium: 31–45%
  • Low: More than 45% (you need better testing and staging)

What good looks like:
Most deployments just work. When they don't, you catch issues in staging or can roll back quickly.

4. Mean Time to Recovery (MTTR)

What it measures:
How long it takes to restore service when a production incident occurs.

Why it matters:
You will have incidents. What matters is how fast you recover. MTTR measures your operational maturity.

How to measure:

MTTR = Total incident resolution time / Number of incidents

Good targets:

  • Elite: Less than 1 hour
  • High: Less than 1 day
  • Medium: 1 day to 1 week
  • Low: More than 1 week

What good looks like:
Clear runbooks, good monitoring, practiced incident response, ability to roll back or forward quickly.

Why DORA Beats Story Points

Story points measure estimation accuracy, not delivery speed or quality. They're a planning tool, not a performance metric.

DORA metrics measure actual throughput, quality, and resilience—things that matter to the business and the team.

If you're still measuring velocity in story points, stop. Start measuring lead time instead.

Developer Experience Metrics

These metrics reveal how painful or smooth your engineering environment is. Bad developer experience (DevEx) drives attrition and slows everything down.

1. Time to First PR for New Hires

What it measures:
How long it takes a new engineer to go from day 1 to submitting their first meaningful pull request.

Why it matters:
Fast onboarding = good documentation, clear architecture, smooth tooling. Slow onboarding = messy systems and tribal knowledge.

How to measure:

Onboarding Time = Days from start date to first PR merged

Good targets:

  • Excellent: 1–3 days
  • Good: 1 week
  • Warning zone: 2–4 weeks
  • Crisis zone: More than 1 month

What to watch:
If this is trending up, your codebase is getting harder to understand or your tooling is degrading.

2. Average CI Build Time

What it measures:
How long engineers wait for tests to run in CI.

Why it matters:
Slow CI kills productivity. If tests take 40 minutes, engineers stop running them locally, batch changes, and context-switch constantly.

How to measure:

CI Build Time = Average time from commit to CI pass/fail

Good targets:

  • Fast: <10 minutes
  • Acceptable: 10–20 minutes
  • Slow: 20–40 minutes
  • Broken: >40 minutes (engineers will work around this)

What to track:
Graph this over time. If it's trending up, invest in parallelizing tests, pruning slow tests, or upgrading CI infrastructure.

3. Code Review Turnaround Time

What it measures:
Time from PR submission to approval/merge.

Why it matters:
Long review times create work-in-progress queues, context switching, and merge conflicts. Engineers start avoiding small PRs because they rot in the queue.

How to measure:

Review Time = Time from PR creation to approval

Good targets:

  • Fast: <4 hours
  • Acceptable: Same day
  • Slow: 1–3 days
  • Broken: >3 days (engineers will stop breaking up work into small PRs)

What to watch:
If review time is high, look at team size, async communication norms, or whether senior engineers are overloaded.

Quality & Reliability Metrics

Fast but unreliable is not success. These metrics keep quality honest.

1. Incident Frequency and MTTR

We covered MTTR in DORA. Also track:

Incident frequency:

Incidents per week/month

Target:
Trending downward over time. Each incident should lead to improvements that prevent similar incidents.

What to watch:
Flat or increasing incident rate means you're not learning from postmortems.

2. Bug Escape Rate

What it measures:
Percentage of bugs found in production vs. bugs found in pre-production (dev, staging, QA).

Why it matters:
This reveals how good your testing and staging environments are.

How to measure:

Bug Escape Rate = (Prod bugs / Total bugs) × 100%

Good targets:

  • Excellent: <10% (most bugs caught before prod)
  • Good: 10–25%
  • Warning: 25–50%
  • Crisis: >50% (your staging environment doesn't mirror prod)

What to watch:
If this is increasing, your testing strategy or staging environment is degrading.

3. Error Budgets (for Critical Services)

What it measures:
Allowed downtime or error rate before you stop shipping features and focus on reliability.

Why it matters:
Balances speed and stability. Gives teams permission to move fast until the budget is consumed, then forces reliability focus.

How to measure:

Error Budget = (100% - SLA target) × Time period

Example: 99.9% SLA = 0.1% error budget = 43 minutes downtime per month

How to use:

  • Burning error budget fast? Slow down feature work, focus on stability.
  • Error budget healthy? Ship faster.

What to avoid:
Don't optimize for "zero incidents" if it means you stop shipping. The right balance is fast learning with acceptable reliability.

Choosing Metrics by Company Stage

Don't blindly copy Google's metrics. Your stage, constraints, and problems are different.

Early Startup (5–20 engineers)

Focus: Learning speed, customer impact, survival.

Metrics to track:

  1. Lead time for changes (how fast can we learn from customers?)
  2. Incident impact (customer-facing downtime in hours per week)
  3. Deployment frequency (are we shipping regularly?)

Skip for now:
Complex dashboards, code coverage, velocity tracking. You're optimizing for survival and product-market fit, not process perfection.

Scaleup (20–100 engineers)

Focus: Sustainable growth, operational maturity, team scaling.

Metrics to track:

  1. Full DORA set (deployment frequency, lead time, change failure rate, MTTR)
  2. On-call health (alerts per shift, sleep-hour pages)
  3. Onboarding time (time to first PR)
  4. Cost per deploy or per user (cloud costs matter now)

Why these:
You're scaling the team and systems. Process and reliability matter. You need leading indicators of whether scaling is sustainable.

Enterprise (100+ engineers)

Focus: Cross-team coordination, architectural health, long-term reliability.

Metrics to track:

  1. DORA metrics by team (compare and learn from patterns)
  2. Cross-team lead time (features that require multiple teams)
  3. Architectural health indicators (coupling metrics, module boundaries)
  4. Platform adoption metrics (if you have internal platforms)
  5. Retention and satisfaction (are engineers staying and happy?)

Why these:
Coordination costs dominate. You need to track how well teams work together and whether architecture enables or blocks them.

Using Metrics in Rituals (Without Weaponizing Them)

Metrics are tools for learning, not weapons for punishment.

Weekly Engineering Stand-of-Stands (15 minutes)

What to review:

  • This week's deployment count and any failures
  • Current lead time trend
  • Any incidents and recovery time
  • Blockers affecting metrics (CI slow, review queues long)

How to use:

  • Celebrate improvements: "Lead time dropped from 5 days to 3 days—nice work on the CI optimization!"
  • Identify systemic issues: "Review time spiked to 4 days. Do we need to rebalance workload?"

What NOT to do:

  • Compare individuals: "Why is your lead time slower than Sarah's?"
  • Punish: "Your change failure rate is too high. Do better."

Monthly Leadership Reviews (30 minutes)

What to review:

  • DORA metrics trend over last 3 months
  • DevEx metrics (onboarding time, CI speed, review time)
  • Incident trends
  • Correlate with business outcomes when possible

Questions to ask:

  • Are we getting faster or slower?
  • Are we shipping more reliably?
  • What's blocking us?
  • What investments would improve these metrics?

Postmortems

What to track:

  • Incident frequency by type
  • MTTR trends
  • Action item completion rate from previous incidents

How to use:

  • "Our MTTR has dropped from 90 minutes to 30 minutes over the last quarter. Our runbooks and monitoring investments are paying off."
  • "We've had 3 database incidents in 2 months. We need to prioritize that database migration."

Metric Smells: When Your Dashboards Are Lying to You

Here are the red flags that your metrics are broken:

Smell 1: Velocity Always Up, But Morale and Quality Down

What's happening:
Team is gaming story points by inflating estimates or counting smaller and smaller tasks.

Fix:
Replace velocity with lead time. Hard to fake actually shipping faster.

Smell 2: Metrics No One Can Explain

What's happening:
Someone built a dashboard with 40 KPIs. No one remembers why half of them exist.

Fix:
Delete any metric that you can't explain in one sentence or that hasn't been referenced in 3 months.

Smell 3: Dashboards No One Checks

What's happening:
Beautiful dashboards that no one looks at except in board meetings.

Fix:
If a metric doesn't change behavior or decisions, delete it. Metrics should be tools, not decoration.

Smell 4: Metrics Used to Punish

What's happening:
"Your change failure rate is too high. You're underperforming."

Fix:
Metrics should diagnose systems, not blame individuals. If metrics create fear, they'll get gamed or ignored.

Smell 5: Gaming Is Rewarded

What's happening:
Engineers split one feature into 10 tiny PRs to optimize for "number of PRs merged."

Fix:
If people are gaming metrics, either the metric is bad or the incentives are misaligned. Change the metric or stop tying it to performance reviews.

Smell 6: Metrics Without Context

What's happening:
"Our deployment frequency is 2x per week." Is that good? For a startup, it's slow. For a bank, it's fast.

Fix:
Always provide context: trends over time, comparison to your own past, or industry benchmarks for your stage.

Less Metrics, More Insight

Here's what I've learned after years of building and breaking engineering dashboards:

A handful of well-chosen metrics beats a sea of dashboards.

Most teams would be better off tracking 5–7 core metrics deeply than tracking 40 metrics shallowly.

The Minimal Viable Metric Set

If I could only track 5 metrics for a typical scaleup team, I'd choose:

  1. Lead time for changes (are we shipping faster or slower?)
  2. Change failure rate (are we shipping reliably?)
  3. MTTR (when things break, how fast do we recover?)
  4. Time to first PR for new hires (is our system getting easier or harder to work in?)
  5. Incident frequency trend (are we learning and improving?)

These five tell you:

  • How fast you're delivering
  • How reliably you're delivering
  • How sustainable your pace is
  • Whether you're getting better over time

Everything else is detail.

Your Metric Audit Checklist

Use this to prune and redesign your team's metrics:

Step 1: List all current metrics

  • Write down every metric on every dashboard you have

Step 2: Ask hard questions for each metric

  • Can we explain this metric in one sentence?
  • Has this metric changed a decision in the last 3 months?
  • Would we notice if this metric disappeared?
  • Is this metric gameable without real improvement?
  • Does this metric drive better behavior?

Step 3: Delete ruthlessly

  • Delete any metric where you answered "no" to questions 2 or 3
  • Delete any metric where you answered "yes" to question 4 without "yes" to question 5

Step 4: Choose your core set (5–7 metrics)

  • Pick 2–3 delivery metrics (DORA subset)
  • Pick 1–2 quality/reliability metrics
  • Pick 1–2 developer experience metrics

Step 5: Design your rituals

  • Weekly review: Quick check of trends and blockers (15 min)
  • Monthly review: Deep dive on trends and investments (30 min)
  • Quarterly review: Calibrate metrics and adjust if needed (1 hour)

Step 6: Set clear guidelines

  • Metrics are for learning, not punishment
  • Metrics diagnose systems, not individuals
  • Gaming metrics is a signal the metric is wrong
  • Celebrate improvements, investigate degradations

Step 7: Build feedback loops

  • Review metric usefulness quarterly
  • Delete metrics that stop being useful
  • Add metrics when you have new questions
  • Always prefer simple over complex

Stop Measuring Everything, Start Measuring What Matters

Engineering metrics should answer a few critical questions:

  • Are we shipping faster or slower?
  • Are we shipping reliably?
  • Is our system getting easier or harder to work in?
  • Are we learning and improving?

If your metrics don't answer these questions, or if they create theater instead of insight, delete them.

Build a small, useful set of metrics. Review them regularly. Use them to diagnose and improve systems, not to judge individuals.

And remember: the best metric is the one that changes behavior for the better.

Everything else is just dashboard decoration.

Topics

engineering-metricsdora-metricsengineering-managementvelocityteam-performancekpisdeveloper-experience
Ruchit Suthar

About Ruchit Suthar

Technical Leader with 15+ years of experience scaling teams and systems