From 13:00 to 13:40 UTC on July 18, 2017, the DUO39 deployment experienced increased authentication latency that caused authentication failure for some customer applications protected by the Duo service.
As part of our rolling release process, Duo consistently makes new features and other general service improvements available to customers. Duo’s engineering team has implemented processes allowing these types of changes to be made in a gradual and automated fashion, and these processes are exercised regularly as Duo releases code on a biweekly basis.
On July 13th, the latest software release of the Duo platform was deployed to DUO39 without issue. This release contained a number of performance improvements and laid the groundwork for an upcoming feature enhancement. A defect in the code supporting this upcoming feature enhancement resulted in database performance issues under a specific, unique workload. This defect went undetected until this specific workload was exercised on the morning of July 18th.
Duo’s monitoring systems detected and alerted engineering team members to an elevated but not yet service impacting increase in database latency at 12:40 UTC on July 18th. While the issue was being investigated, database latency increased to service impacting levels beginning at 13:00 UTC. Once the team was able to determine root cause, a software patch was developed and deployed to the DUO39 deployment. Service stabilized and all authentication requests began being processed as expected at that time.
Duo’s engineering team has integrated this patch into the core version of our software to prevent future occurrences of this regression. The team will also evaluate the load testing methodologies used to test this enhancement prior to release to identify opportunities to better detect this sort of issue going forward.