Calculate your Mean Time To Recovery and benchmark your incident response.
Enter your total downtime and incident count, or log individual incidents for a detailed calculation. See how your recovery speed compares to industry benchmarks and identify where to improve.
Mean Time To Recovery
45m
= 45.0 minutes per incident
Elite — < 1 hour| Tier | MTTR |
|---|---|
| Elite | < 1 hour |
| High Performer | 1 – 4 hours |
| Medium | 4 – 24 hours |
| Needs Improvement | > 24 hours |
Based on DORA research and industry data for infrastructure teams.
Mean Time To Recovery (MTTR) measures the average time it takes to restore service after an incident. It starts when a failure is detected and ends when the service is fully operational again. For ISPs, MTTR encompasses everything from the first monitoring alert to the last subscriber coming back online.
MTTR is the single most actionable reliability metric because it reflects your entire incident response pipeline: detection speed, team responsiveness, diagnostic capability, and repair efficiency. According to the DORA (DevOps Research and Assessment) State of DevOps reports, elite-performing teams recover from incidents in under one hour, while low performers can take over 24 hours. A high MTTR almost always points to a specific bottleneck that can be improved.
MTTR = Total Downtime ÷ Number of Incidents
For example, if your FTTH network experienced 4 incidents last month with a combined downtime of 180 minutes, your MTTR is 180 ÷ 4 = 45 minutes per incident. This places you in the “Elite” benchmark tier — under 1 hour per incident.
Every incident passes through four phases. Improving any phase reduces your overall MTTR:
Monitoring identifies the problem. This is the highest-impact improvement area — 30-second polling can reduce detection from hours to under a minute.
A team member acknowledges the alert and begins investigation. Clear escalation policies and on-call rotations minimize response time.
Root cause is identified using logs, topology maps, and metrics. Good observability tools make the difference between minutes and hours of diagnosis.
The fix is applied and service is restored. Runbooks, automated rollbacks, and remote device management (TR-069) accelerate repairs.
For FTTH ISPs, the fastest path to lower MTTR is better detection. When a single OLT outage can affect hundreds of subscribers, the difference between detecting it in 30 seconds versus 30 minutes has massive impact on both SLA compliance and customer satisfaction.
Beyond detection, invest in smart alert correlation to reduce noise (fewer false alarms means faster response to real issues), interactive topology maps for rapid root cause analysis, and TR-069 CPE management for remote repairs without truck rolls.
Common questions about MTTR, incident recovery metrics, and reducing downtime for ISPs.
NetSense NMS detects FTTH network issues in under 30 seconds with smart alert correlation that cuts noise by 95%. Faster detection means faster recovery — and happier subscribers.
Learn more: Alerting & Escalation · SLA Calculator · Compare NetSense