DUO7: Service Unavailable
Incident Report for Duo
Postmortem

Authentication Issues - DUO7

Incident Report - 2020/06/11

Summary:

From 7:56 PM EDT to 8:20 PM EDT on June 12, 2020, users on DUO7 experienced authentication failures. During this timeframe, all authentication requests failed. The Duo Engineering Team resolved the issue and restored service at 20:20 EDT.

Details:

At 19:57 EDT, the engineering team was alerted via our monitoring that latency on DUO7 had increased. At 20:01, automated monitoring reported failed authentications and the engineering team immediately began investigating. At 20:02, logs reported that a group of servers in the session store of the Duo service was unhealthy. Servers of this type usually recover automatically within a few minutes.

At 20:09, the engineering team noted that the failed group of servers had not recovered on their own and began manual remediation.

At 20:18, the engineering team successfully repaired the session store and restarted services.

At 20:20, the engineering team noted that traffic had returned to normal levels.

Duo’s engineering team has identified a path to significantly reducing recovery times for this type of failure and are working with our partners to determine why automated recovery failed in this instance.

Posted Jun 12, 2020 - 16:07 EDT

Resolved
We have identified and resolved an issue that affected all components of the DUO7 deployment, causing the Service to be unavailable. The issue has been resolved and a full RCA will be posted here in the next 24 hours.
Posted Jun 11, 2020 - 20:44 EDT
This incident affected: DUO7 (Core Authentication Service, Admin Panel, Push Delivery, Phone Call Delivery, SMS Message Delivery, Cloud PKI).