Change Failure Rate: The DORA Metric That Tells You How Much Pain You're Causing Yourself

DevOps MetricsThe OpsTrails Team|January 5, 2026|5 min read

Google's research says elite teams break production less than 15% of the time. Where does your team sit?

If you could only track one metric to understand how much self-inflicted pain your engineering team endures, it would be Change Failure Rate (CFR). It's one of the five core DORA metrics — developed by Google's DevOps Research and Assessment team through years of research surveying over 32,000 professionals — and it answers a brutally simple question: what percentage of your deployments cause problems in production?

What is a good change failure rate? According to Google's DORA research, elite and high-performing teams keep their change failure rate between 0% and 15%. The 2025 DORA report puts the ideal for top-tier performance at 0–2%. Low performers sit at 45–60%.

Performance tier	Change failure rate
Elite / High performers	0–15% (2025 ideal: 0–2%)
Medium performers	16–30%
Low performers	45–60%

What Is Change Failure Rate? The DORA Metric Explained

CFR is the ratio of deployments that result in a failure — an incident, a rollback, a hotfix, degraded service — divided by the total number of deployments. If your team deployed 100 times last quarter and 25 of those deployments required remediation, your CFR is 25%.

The definition of "failure" varies by organisation. Some teams count only full outages. Others include any deployment that required a rollback. Some track incidents of any severity. The important thing is consistency — pick a definition and measure it honestly.

DORA Change Failure Rate Benchmarks: Elite vs. Low Performers

The 2022 Accelerate State of DevOps report established clear performance tiers for CFR:

Elite and High-performing teams: 0–15% change failure rate
Medium-performing teams: 16–30%
Low-performing teams: 45–60%

The 2025 DORA Report tightened the bar further, showing that the ideal change failure rate for top-tier performance is between 0% and 2%. Only 16.7% of survey respondents reported a CFR that low.

Think about the low-performing tier for a moment. A 45–60% change failure rate means that nearly every other deployment causes a production issue. Half the team's work is creating problems that the other half of the work has to fix. It's a hamster wheel of self-inflicted wounds — and part of the reason 80% of production outages are self-inflicted.

Change Failure Rate vs. Deployment Frequency: Which Matters More?

There's a temptation in DevOps culture to optimise for speed — deploy more often, ship faster, move fast and break things. But DORA's multi-year research consistently shows that speed without stability is destructive. A high deployment frequency combined with a high change failure rate doesn't mean you're moving fast. It means you're breaking things fast.

The research demonstrates that elite teams achieve both high throughput and high stability. They deploy frequently and rarely break production. These aren't opposing goals — they're complementary outcomes of good practices.

Why Most Teams Don't Know Their Real Change Failure Rate

Here's the uncomfortable part. Many teams don't actually track their change failure rate. They know roughly how many times they deploy. They have a vague sense of how often things go wrong. But they haven't connected the two data sets in a systematic way.

This happens because deployment data lives in one system (CI/CD), incident data lives in another (PagerDuty, Opsgenie), and nobody has built the bridge between them. Without that bridge, CFR is invisible — and invisible problems don't get fixed. This is also why the shift from firefighting to forecasting requires building systematic operational memory.

How to Measure and Reduce Your Change Failure Rate

Reducing your change failure rate starts with seeing it clearly. That requires linking every production incident back to the change that caused it. Not in a retrospective two weeks later. In real time, during the incident.

OpsTrails' deployment tracking software makes this connection automatic. Every deployment, rollback, config change, and data load is captured as a structured event with timestamps, versions, sources, and subjects. When an incident occurs, the operational timeline immediately shows what changed in the relevant window. Your AI assistant can query this directly: "What was deployed to production in the last 3 hours?" gets a precise answer, not a Slack scavenger hunt.

Over time, this data lets you calculate your actual CFR, identify patterns (which services break most often? which types of changes are riskiest?), and drive meaningful improvement. Use OpsTrails impact analysis to correlate deployments with error spikes automatically.

Because you can't improve a metric you're not measuring. And you can't measure a metric when the underlying data is scattered across ten different tools.

OpsTrails records every change across your infrastructure. Correlate deployments with error spikes to understand your real change failure rate — not a quarterly guess.

→ Measure your CFR

Sources: Google DORA State of DevOps Report (2022, 2025), Accelerate: The Science of Lean Software and DevOps (Forsgren, Humble, Kim), DORA metrics framework (dora.dev).