Auth Proxy SSO intermittent authentication failures
Incident Report for Duo
Postmortem

SSO Active Directory Authentication Failures Across Multiple Deployments

Incident Report - 2023-06-12

Summary

On June 12, 2023, at around 9:48 EDT, Duo's Engineering Team was notified by customers that we were seeing degraded performance in our Duo Single Sign-On (SSO) product.

The root cause was identified as an increased amount of load on the deployment that overwhelmed the services responsible for managing authentications against an Active Directory identity provider.

The issue was resolved on the same day by implementing several load mitigation techniques. The techniques included infrastructure scaling, configuration changes, and algorithm updates.

Deployments Impacted

  • DUO9 (SSO), DUO17 (SSO), DUO22 (SSO), DUO39 (SSO), DUO40 (SSO), DUO42 (SSO), DUO49 (SSO), DUO50 (SSO), DUO52 (SSO), DUO55 (SSO), DUO56 (SSO), DUO58 (SSO), DUO62 (SSO), DUO63 (SSO), DUO64 (SSO), DUO65 (SSO), DUO72 (SSO), and DUO73 (SSO)
  • To determine which deployment ID you are on, please refer to this Duo Knowledge Base article

Timeline of Events EDT

2023-06-12 09:48 Duo Site Reliability Engineering (SRE) is informed by customers that they are seeing intermittent authentication failures in our SSO product. SRE begins triage.

2023-06-12 10:26 Duo SRE removes extraneous data from our Redis caching layer to reduce load.

2023-06-12 10:30 Duo SRE rolls out a configuration change to our Redis infrastructure to increase top level performance.

2022-06-12 10:36 Status page updated to: Issue Identified.

2023-06-12 10:58 Duo SRE rolls out an algorithm change intended to reduce overhead per authentication.

2023-06-12 11:01 Duo SRE noticed that all systems appeared to be healthy and authentications were being serviced at 100% again.

2022-06-12 11:05 Status page updated to: Monitoring.

2022-06-12 14:16 Status page updated to: Resolved.

Details

In order to service Authentications for customers using Active Directory, Duo SSO uses the Duo Authentication Proxy to perform LDAP queries. These Authentication Proxies are managed by customers inside their own infrastructure. To accommodate for the lack of insight Duo has into our customers’ local network topology, we implemented a pathing algorithm that would be resilient to various network connectivity issues that may arise. However, we have since learned that this algorithm is costly both for Duo’s infrastructure and the infrastructure managed by our customers.

On June 12, 2023 we reached a tipping point where additional authentication load became too much for our infrastructure to handle. This caused intermittent authentication failures for some customers because our processes were too overwhelmed to take on new requests and were also sometimes too slow in responding to active requests.

Resolution

Duo resolved this by implementing several changes:

  1. Horizontal scaling of our servers that perform Active Directory authentications.
  2. Configuration changes to increase the maximum performance of our Redis database.
  3. Clearing out of excess data in our Redis Database.
  4. Rolling out algorithm changes that optimize the authentication path to be less costly.

Recommendations

If you are using Duo SSO with an Active Directory identity provider we highly recommend you run your Authentication Proxies in our suggested High Availability (HA) configuration. You can read more about those recommendations here. These recommendations will make your setup more resilient in times of high load, and could reduce impact to your users in the case of a Duo Service degradation event like this one.

What is Duo doing to prevent this in the future?

  • Duo is continuing to investigate and implement further algorithm changes to increase the performance of Active Directory Authentication path.
  • As a result of this incident, Duo has identified several leading indicators of this type of issue recurring. We will be improving our automated alerting to detect and notify us when these events are occurring.
  • Duo is looking to further expand its Authentication Proxy performance recommendations to cover the SSO use case.

Note: You can find your Duo deployment’s ID and sign up for updates via the StatusPage by following the instructions in this Duo Knowledge Base article.

Posted Jun 16, 2023 - 13:18 EDT

Resolved
We have confirmed that the issues with Duo SSO authentications through the Duo Authentication Proxy are now fully resolved, and will provide an RCA as soon as it is available.
Posted Jun 12, 2023 - 14:16 EDT
Monitoring
Action has been taken to alleviate issues with the Duo SSO authentications through the Authentication Proxy. We are currently monitoring the results.
Posted Jun 12, 2023 - 11:05 EDT
Investigating
We are investigating intermittent issues with Authentication Proxy connections that are needed to service Duo SSO authentications. More updates will be posted shortly.
Posted Jun 12, 2023 - 10:36 EDT
This incident affected: DUO9 (SSO), DUO17 (SSO), DUO22 (SSO), DUO39 (SSO), DUO40 (SSO), DUO42 (SSO), DUO49 (SSO), DUO50 (SSO), DUO52 (SSO), DUO55 (SSO), DUO56 (SSO), DUO58 (SSO), DUO62 (SSO), DUO63 (SSO), DUO64 (SSO), DUO65 (SSO), DUO72 (SSO), and DUO73 (SSO).