DORA metrics are the industry standard for measuring software delivery performance. Developed by Google's DevOps Research and Assessment team over a decade of research involving more than 39,000 data points from thousands of organizations, these four metrics provide an evidence-based framework for understanding how effectively your engineering team delivers software. A 2024 report from DORA found that elite-performing teams deploy 973 times more frequently than low performers while simultaneously maintaining higher stability. That is not a tradeoff. That is proof that speed and quality can coexist when teams measure and improve the right things.
This guide explains what each metric means, how to measure them, how to set benchmarks, and how to improve each one without falling into the anti-patterns that turn metrics into dysfunction.
What Are DORA Metrics
DORA metrics are four key measurements that predict software delivery performance and organizational outcomes. They emerged from the State of DevOps research program, which began in 2014 and was later acquired by Google. The research consistently shows that these four metrics correlate with both technical performance (reliability, quality) and organizational performance (profitability, market share, employee satisfaction).
The metrics split into two categories:
Throughput metrics measure how fast your team delivers value:
- Deployment Frequency
- Lead Time for Changes
Stability metrics measure how safely your team operates:
- Mean Time to Recovery (MTTR)
- Change Failure Rate
The insight behind DORA is that you do not have to choose between speed and stability. The research shows that the highest-performing teams excel at both. Teams that deploy frequently also recover faster and fail less often. The practices that enable speed (automation, small batch sizes, good testing) are the same practices that enable stability.
This finding contradicts the intuition many engineering leaders carry, that moving faster means accepting more risk. The data tells a different story. Deploying small changes frequently reduces risk per deployment. Automated testing catches defects early. Monitoring detects problems fast. The result is that the fastest teams are also the safest teams. Understanding this relationship is the first step toward improving your engineering organization's performance.
DORA metrics have gained adoption across the industry because they are simple, measurable, and predictive. Unlike story points or lines of code, these metrics connect directly to business outcomes. Organizations with elite DORA performance report higher profitability, higher market share, and higher employee satisfaction than their lower-performing peers.
The 4 Key Metrics Explained
Deployment Frequency
What it measures: How often your team successfully deploys code to production.
Why it matters: Deployment frequency is a proxy for batch size. Teams that deploy frequently ship smaller changes. Smaller changes are easier to review, easier to test, and easier to roll back if something goes wrong. Large, infrequent deployments bundle risk.
How to track it: Count production deployments per day, week, or month. If your team uses a CI/CD pipeline, your deployment tool likely already logs this data. GitHub Actions, GitLab CI, ArgoCD, and similar tools all provide deployment history.
Lead Time for Changes
What it measures: The elapsed time from a developer's first commit to that code running in production.
Why it matters: Lead time reveals how much friction exists in your delivery pipeline. A team with a one-hour lead time has a fast, automated path to production. A team with a two-week lead time has manual steps, long review queues, slow test suites, or approval bottlenecks.
How to track it: Measure the time between the first commit on a branch (or the merge commit) and the deployment event that includes that commit in production. This requires correlating your version control data with your deployment data.
Mean Time to Recovery (MTTR)
What it measures: How long it takes to restore service after an incident or deployment failure.
Why it matters: Failures are inevitable. What separates high-performing teams from others is how quickly they detect, diagnose, and fix problems. Low MTTR requires good monitoring, clear runbooks, automated rollback capabilities, and a culture of rapid response.
How to track it: Measure the time between incident detection (an alert fires or a user reports the problem) and service restoration (the fix is deployed and the system is back to normal). Your incident management tool (PagerDuty, Opsgenie, or similar) likely records both timestamps.
Change Failure Rate
What it measures: The percentage of deployments that cause a failure requiring remediation (a rollback, a hotfix, a patch, or an incident).
Why it matters: Change failure rate reflects the quality of your testing, review, and deployment processes. A high failure rate means your safety nets have gaps. Failures that make it to production represent risks that earlier stages should have caught.
How to track it: Divide the number of deployments that caused an incident or required remediation by the total number of deployments in the same period. You need a reliable way to link incidents to the deployments that caused them.
Industry Benchmarks
The annual State of DevOps report provides benchmark data across four performance tiers. As of the 2024 report:
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | Multiple times per day | Weekly to monthly | Monthly to every 6 months | Less than every 6 months |
| Lead Time for Changes | Less than 1 hour | 1 day to 1 week | 1 to 6 months | More than 6 months |
| MTTR | Less than 1 hour | Less than 1 day | 1 day to 1 week | More than 6 months |
| Change Failure Rate | 0-5% | 6-10% | 11-15% | 16-30% |
Use these benchmarks as directional guides, not rigid targets. A team that deploys once a week and is working toward daily deployments is making meaningful progress even if "elite" is still a long way off.
Context matters: a regulated healthcare application has different constraints than a consumer mobile app. The goal is not to match someone else's numbers. The goal is to improve your own numbers consistently.
One important caveat about benchmarks: the DORA report aggregates data across industries, company sizes, and technology stacks. Your specific context may make some benchmarks unrealistic or irrelevant. A team that handles financial transactions with regulatory audit requirements will have a different lead time profile than a team shipping a marketing website. The benchmarks provide aspiration and context, not judgment.
That said, most teams underestimate what is achievable. A team that currently deploys monthly and considers that "normal" often discovers, after investing in automation and testing, that weekly or even daily deployments are within reach. The benchmarks serve as evidence that faster, safer delivery is possible, even if the specific numbers for your situation differ.
How to Measure DORA Metrics
Measuring DORA metrics accurately requires connecting data from several systems: your version control platform, your CI/CD pipeline, your deployment tool, and your incident management system.
Data Sources
Deployment Frequency: CI/CD pipeline logs. Count successful production deployments per time period. If you use GitHub Actions, each successful workflow run that deploys to production is a deployment event.
Lead Time for Changes: Git commit history plus deployment logs. Calculate the time between the first commit on a feature branch and the deployment that includes those changes. This can be tricky with squash-merge workflows; decide whether you measure from the first commit on the branch or from the merge to main.
MTTR: Incident management data. Measure from detection (alert triggered or incident created) to resolution (service restored). Ensure your team consistently creates and closes incident records so the data is clean.
Change Failure Rate: Deployment logs correlated with incident records. You need a way to link a specific incident to the deployment that caused it. Some teams use deployment IDs in incident metadata. Others use time-based correlation: if an incident starts within N minutes of a deployment, they are likely related.
Automation
Manual tracking breaks down quickly. Teams that maintain spreadsheets for DORA metrics stop updating them within a quarter. Automate data collection using:
- Pipeline webhooks: Capture deployment events automatically.
- Git hooks or API integrations: Capture commit timestamps.
- Incident management APIs: Pull incident creation and resolution times.
- Dashboard tools: Visualize trends over time.
If you practice trunk-based development, lead time measurement becomes simpler because the path from commit to production is shorter and more linear.
Building a DORA Dashboard
A DORA dashboard should answer three questions at a glance: Where are we now? Are we improving? Where should we focus?
Essential Views
Current State: Show each metric's current value alongside its benchmark tier (Elite, High, Medium, Low). Use the last 30 days for deployment frequency and change failure rate. Use the last 90 days for lead time and MTTR to smooth out outliers.
Trend Lines: Plot each metric weekly or monthly over the last 6-12 months. Trends matter more than snapshots. A team at "Medium" that is trending upward is in better shape than a team at "High" that is trending downward.
Distribution Histograms: For lead time and MTTR, show the distribution, not just the average. An average MTTR of 4 hours could mean all incidents take 4 hours, or it could mean most take 30 minutes while a few take 24 hours. The distribution tells a different story that demands different solutions.
Segmentation
Segment metrics by team, service, or repository where possible. Aggregate DORA metrics can hide problems. If one team deploys daily and another deploys monthly, the average is "weekly," which describes neither team accurately.
Avoiding Dashboard Fatigue
Start with four charts, one per metric, and a simple red/yellow/green indicator for current tier. Add complexity only when the team requests it. A dashboard that nobody looks at is worse than no dashboard at all.
Sharing the Dashboard
Make the DORA dashboard accessible to everyone, not just engineering. Product managers benefit from seeing deployment frequency and lead time because those metrics affect release planning. Leadership benefits from seeing MTTR and change failure rate because those metrics reflect operational risk. When DORA metrics are visible across the organization, conversations about delivery capacity become data-driven rather than opinion-driven.
Some teams display their DORA dashboard on a monitor in the office or pin it in a shared Slack channel. The goal is passive awareness: people should be able to glance at the metrics without seeking them out. This visibility creates accountability and celebrates improvement.
Improving Each Metric
Improving Deployment Frequency
Reduce batch size: Deploy smaller changes more often. Break large features into incremental slices that can ship independently.
Use feature flags: Decouple deployment from release. Ship code to production behind a flag and enable it when ready. This removes the fear of deploying incomplete features.
Automate the pipeline: Every manual step in your deployment process is friction that reduces frequency. Automate testing, security scanning, and deployment. A well-built CI/CD pipeline makes deploying as easy as merging a pull request.
Practice trunk-based development: Long-lived branches increase merge complexity and delay integration. Teams that merge to main daily or multiple times per day deploy more frequently because their changes are smaller and better tested.
Improving Lead Time for Changes
Parallelize CI steps: If your test suite takes 30 minutes, run test groups in parallel to cut it to 10 minutes.
Reduce review queue time: Code reviews that sit for days add days to lead time. Set team norms for review turnaround (under 4 hours is a common target).
Eliminate manual approvals: Where possible, replace manual deployment approvals with automated quality gates. If the tests pass and the security scan is clean, the code should be deployable without waiting for a human to click a button.
Shrink test suites strategically: Flaky tests that get re-run add time. Tests that duplicate coverage waste cycles. Keep your test suite fast and meaningful.
Improving Mean Time to Recovery
Invest in observability: You cannot fix what you cannot detect. Metrics, logs, and traces should give your team the ability to diagnose problems in minutes, not hours.
Automate rollbacks: If a deployment causes a spike in error rates, the system should be able to roll back automatically. Manual rollback procedures add time and stress during an incident.
Write runbooks: Document the steps for diagnosing and resolving common failure modes. Runbooks turn a 2-hour investigation into a 15-minute checklist.
Practice incident response: Run game days or fire drills where the team practices responding to simulated incidents. Teams that practice recover faster when real incidents occur.
Improving Change Failure Rate
Expand test coverage in high-risk areas: Focus testing effort on the parts of your codebase that change frequently and affect users directly.
Use canary deployments: Route a small percentage of traffic to the new version before rolling it out fully. This catches production-only failures before they affect all users.
Conduct pre-deployment reviews: Not just code reviews, but deployment reviews. Does this change include a database migration? Does it change an API contract? Does it touch a service with known fragility?
Analyze failure patterns: When deployments fail, categorize the root cause. If 60% of your failures come from database migrations, that is where you invest in better tooling and processes.
Invest in staging environments: Production-only failures often occur because the testing environment does not match production closely enough. Staging environments that mirror production configuration, data volume, and traffic patterns catch failures before they reach users.
A Note on Sequencing
If all four metrics need improvement, which one should you start with? The DORA research suggests that deployment frequency and lead time for changes are the best starting points because they create the conditions for improving the other two. Teams that deploy frequently with small changes naturally reduce their change failure rate (smaller changes fail less often) and improve their MTTR (smaller deployments are easier to diagnose and roll back). Start by making deployments smaller, faster, and more automated. The stability metrics tend to follow.
DORA Metrics Anti-Patterns
Metrics can drive improvement, but they can also drive dysfunction when misapplied. Watch for these anti-patterns.
Using Metrics as Individual Performance Measures
DORA metrics measure team and system performance, not individual performance. Using deployment frequency to evaluate a specific developer incentivizes small, meaningless commits and discourages thoughtful refactoring.
Gaming the Numbers
If deployment frequency is the target, teams can deploy empty changes. If change failure rate is the target, teams can avoid deploying risky but valuable changes. Metrics should inform decisions, not replace judgment.
Ignoring Context
A team that deploys once a week to a safety-critical system in a regulated industry is not "low performing." A team with a high change failure rate because they are migrating a legacy system is not failing. Always interpret metrics within the context of what the team is working on.
Measuring Without Acting
Dashboards that nobody uses and metrics that nobody discusses in sprint retrospectives provide zero value. Measurement is only useful when it leads to action: identifying a bottleneck, running an experiment, and measuring whether the experiment worked.
Optimizing One Metric at the Expense of Others
Deploying more frequently by skipping tests improves deployment frequency while worsening change failure rate. The four metrics are designed to balance each other. Optimize them as a set, not individually.
According to the 2024 DORA report, teams that focus on improving all four metrics simultaneously perform better than teams that optimize one metric at a time. The metrics are interconnected: better testing improves both lead time (fewer failures blocking the pipeline) and change failure rate. Better monitoring improves both MTTR and deployment frequency (confidence to deploy more often).
DORA + Codebase Intelligence
DORA metrics tell you what is happening. They do not always tell you why.
If your lead time is increasing, is it because the test suite is slow, the review queue is long, or the code has become so complex that changes take longer to write? DORA does not answer that.
If your change failure rate is rising, is it because of a specific service, a specific team, or a specific type of change? DORA alone cannot pinpoint the cause.
Codebase intelligence fills this gap. Tools that map code complexity, dependency structures, and ownership patterns provide the "why" behind the "what."
For example, Glue maps your codebase's structure, identifying modules with high complexity, files with concentrated ownership, and dependencies that create hidden coupling. When your DORA metrics show a problem, codebase intelligence helps you find the root cause.
An engineering leader tracking DORA metrics might notice that MTTR spikes for incidents involving the payment service. Codebase intelligence reveals that the payment service has a bus factor of one, lacks integration tests, and has the highest cyclomatic complexity in the system. Now the team knows where to invest.
DORA metrics plus codebase intelligence is the combination that turns measurement into action.
Consider an example: your deployment frequency is high, but your change failure rate is rising. DORA metrics surface the symptom. Codebase intelligence reveals that recent failures concentrate in a specific service with 47 files, cyclomatic complexity averaging 28 per function, and test coverage at 31%. The path forward becomes concrete: invest in test coverage and refactoring for that specific service. Without the "why," the team might invest in a new testing framework when the real problem is complexity in one corner of the codebase.
This is why DORA metrics are necessary but not sufficient. They answer "how are we performing?" Codebase intelligence answers "where should we invest to improve?" Together, they form a closed loop between measurement and targeted action.
Conclusion
DORA metrics work because they focus on outcomes rather than activities. They do not ask how many story points your team completed. They ask how quickly value reaches users and how reliably the system runs.
Start by measuring the four metrics, even roughly. Compare against the DORA benchmarks. Identify your biggest constraint and run a focused experiment to improve it. Measure the results. Iterate.
The teams that improve consistently are not the ones with the most sophisticated dashboards. They are the ones that discuss their metrics regularly, run small experiments, and hold themselves accountable for improvement. Measurement without action is overhead. Measurement with action is a competitive advantage.
If you take one thing from this guide, make it this: start measuring. Even imperfect data is better than no data. You do not need a sophisticated tool to count deployments per week, track how long changes take to reach production, record incident recovery times, and note which deployments caused problems. Start with a spreadsheet if you must. Graduate to automation when the habit is established. The data will tell you where to invest, and the investments will compound over time.
Frequently Asked Questions
What are the 4 DORA metrics?
The four DORA metrics are Deployment Frequency (how often your team deploys to production), Lead Time for Changes (time from first commit to production deployment), Mean Time to Recovery (how quickly you restore service after an incident), and Change Failure Rate (percentage of deployments causing failures). The first two measure throughput; the latter two measure stability. Together, they provide a balanced view of software delivery performance.
What is a good deployment frequency?
According to DORA's benchmarks, elite teams deploy multiple times per day. High performers deploy weekly to monthly. However, "good" depends on your context. A startup shipping a consumer web app should target daily deployments. A team maintaining a medical device firmware might deploy monthly and that is appropriate. The goal is to increase frequency relative to your current baseline while maintaining or improving your change failure rate. More frequent deployments with smaller batches reduce risk rather than increase it.
How do you improve mean time to recovery?
Improving MTTR requires investment in four areas: detection (monitoring and alerting that catches problems fast), diagnosis (observability tools and structured logging that help you find the root cause quickly), resolution (automated rollback capabilities and well-documented runbooks), and practice (regular incident response drills that build team muscle memory). Of these, detection usually offers the biggest initial improvement. Many teams lose hours before they realize something is wrong. An alert that fires within 60 seconds of a problem can cut MTTR by half or more.
Are DORA metrics enough to measure team health?
DORA metrics measure software delivery performance, which is one dimension of team health but not the only one. They do not capture developer satisfaction, code quality trends, knowledge distribution, or the sustainability of the team's pace. A team can have elite DORA metrics while burning out. Supplement DORA with developer experience surveys, code health metrics (complexity, duplication, test coverage trends), and knowledge risk indicators (bus factor, ownership concentration). DORA tells you how the pipeline performs. You need other signals to understand how the people and the codebase behind that pipeline are doing.