Incident Report - 2020/06/11
Summary:
From 7:56 PM EDT to 8:20 PM EDT on June 12, 2020, users on DUO7 experienced authentication failures. During this timeframe, all authentication requests failed. The Duo Engineering Team resolved the issue and restored service at 20:20 EDT.
Details:
At 19:57 EDT, the engineering team was alerted via our monitoring that latency on DUO7 had increased. At 20:01, automated monitoring reported failed authentications and the engineering team immediately began investigating. At 20:02, logs reported that a group of servers in the session store of the Duo service was unhealthy. Servers of this type usually recover automatically within a few minutes.
At 20:09, the engineering team noted that the failed group of servers had not recovered on their own and began manual remediation.
At 20:18, the engineering team successfully repaired the session store and restarted services.
At 20:20, the engineering team noted that traffic had returned to normal levels.
Duo’s engineering team has identified a path to significantly reducing recovery times for this type of failure and are working with our partners to determine why automated recovery failed in this instance.