Incident Report - 2019/03/02
Summary:
From 3:07 to 3:19 UTC on March 3rd, 2019, the DUO55 deployment experienced performance degradation that resulted in increased authentication latency. The majority of authentications during this time were affected. The root cause of this outage has been identified and Duo’s engineering team is committed to reducing the overall impact of similar events going forward.
Details:
The DUO55 database tier is composed of multiple database clusters. Each cluster is composed of multiple sets of database hardware in several separate locations, configured in an active/standby setup.
At 3:07 UTC, one of DUO55’s active databases failed, preventing interactions with data contained within and causing authentication failures. Automated failover processes began configuring standby hardware to replace the failed database.
At 3:19 UTC, the failover was complete and service was completely restored.
Duo’s Engineering team determined the incidents on March 2nd and March 3rd to be related. On March 3rd at 19:04 UTC, Duo’s Engineering team replaced and upgraded the database services on the DUO55 deployment to prevent future recurrence of these issues.