Multiple Deployments: Authentication failures
Incident Report for Duo
Postmortem

Summary

On July 25, 2024, between 12:00 and 12:02 ET, customers on select deployments experienced failed authentications. The short window of downtime resulted in retries that would allow users to successfully authenticate. The root cause was identified as a spike in database contention that was resolved as performance normalized. 

Deployments Impacted

  • DUO1, DUO55, DUO56, DUO60, DUO62 & DUO78

Timeline of Events ET

2024-07-25 12:05 Duo Site Reliability Engineering (SRE) is informed by automated monitoring system of a spike in database contention indicating potential authentication failures. 

2024-07-25 12:07 Automated monitoring self-resolves.

2024-07-25 12:35 Initial investigation suggests that a single customer was affected.    

2024-07-25 12:37 Further investigation reveals that a small number of customers across multiple deployments were affected.  

2024-07-25 12:55 The Duo status page is updated to inform customers of the downtime. 

Details

The root cause was a significant reduction in database Input/Output operations per second (IOPS) over a span of 90 seconds. IOPS are provisioned and provided as compute resources by our cloud provider. A reduction in IOPS availability can occur because of workload demand, exceeding provisioned limits, or from a networking or hardware-related issue.

The Duo SRE team was unable to find or isolate any internal issue that would have contributed to this incident.  We were able to determine that a single spike was evident after the IOPS was momentarily halted, and we are working with our vendor to ensure better transparency if such an issue occurs in the future.  

The issue was resolved within 90 seconds, as database performance normalized.

Posted Aug 07, 2024 - 16:05 EDT

Resolved
At approximately 4pm UTC, we experienced brief outage that caused authentication failures on deployments DUO1, DUO55, DUO56, DUO60, DUO62, DUO78. The issue has since been resolved.

We will provide an RCA as soon as it is available.
Posted Jul 25, 2024 - 12:55 EDT
This incident affected: DUO1 (Core Authentication Service), DUO55 (Core Authentication Service), DUO56 (Core Authentication Service), DUO60 (Core Authentication Service), DUO62 (Core Authentication Service), and DUO78 (Core Authentication Service).