Web-based authentication issues
Incident Report for Duo
Postmortem

On 01 February 2016, at approximately 11:06am EST, users on Duo’s DUO1 deployment experienced increased latency or in some cases a complete inability to authenticate when interacting with Duo’s web-based authentication prompt. The issue stemmed from new code implemented last week that stored endpoint telemetry data with increased fidelity.

The code caused additional data to be loaded from backend systems as part of every authentication request. Rigorous internal load testing of the new code had been conducted prior to release and production systems were fully able to handle the increased load for multiple business days after the code had been deployed. Atypically large numbers of authentication requests were processed by DUO1 this morning, leading to resource starvation which negatively affected users logging in via Duo’s authentication prompt.

At 11:48am EST, Duo’s Operations Team successfully patched the code that caused the issue. At this point in time, authentications began to succeed, and any outstanding requests which had not timed-out were serviced. Existing backlogged requests, in combination with new requests, led to further cascading failures. High levels of load and latency continued throughout this time. By 12:35pm, the Operations Team had successfully deployed additional application servers to handle the high volumes of traffic and latency returned to normal levels.

Next steps to prevent future issues of this nature include additional efforts in load testing new code to ensure no negative impact and enhanced monitoring to allow for potential problems to be identified and acted upon prior to a customer facing event.

Posted Feb 01, 2016 - 17:05 EST

Resolved
After monitoring the issue for several hours, full service has been verified as restored.
Posted Feb 01, 2016 - 17:04 EST
Monitoring
Our Operations Team has identified the issue and patched our service. We have confirmed that authentications are succeeding. We are continuing to analyze authentications to ensure all customers are able to authenticate successfully.
Posted Feb 01, 2016 - 12:56 EST
Investigating
We are investigating reports that our system is failing to complete authentication requests for DUO1 customers. This only affects users that authenticate via our web-based authentication prompt. API-based authentication should be functioning as normal. We will provide an update with more details as soon as they're available.
Posted Feb 01, 2016 - 11:44 EST
This incident affected: DUO1 (Core Authentication Service).