On July 25, 2024, between 12:00 and 12:02 ET, customers on select deployments experienced failed authentications. The short window of downtime resulted in retries that would allow users to successfully authenticate. The root cause was identified as a spike in database contention that was resolved as performance normalized.
2024-07-25 12:05 Duo Site Reliability Engineering (SRE) is informed by automated monitoring system of a spike in database contention indicating potential authentication failures.
2024-07-25 12:07 Automated monitoring self-resolves.
2024-07-25 12:35 Initial investigation suggests that a single customer was affected.
2024-07-25 12:37 Further investigation reveals that a small number of customers across multiple deployments were affected.
2024-07-25 12:55 The Duo status page is updated to inform customers of the downtime.
The root cause was a significant reduction in database Input/Output operations per second (IOPS) over a span of 90 seconds. IOPS are provisioned and provided as compute resources by our cloud provider. A reduction in IOPS availability can occur because of workload demand, exceeding provisioned limits, or from a networking or hardware-related issue.
The Duo SRE team was unable to find or isolate any internal issue that would have contributed to this incident. We were able to determine that a single spike was evident after the IOPS was momentarily halted, and we are working with our vendor to ensure better transparency if such an issue occurs in the future.
The issue was resolved within 90 seconds, as database performance normalized.