Admin functions impacted when accessing Users and AD Sync on certain deployments

Incident Report for Duo

Postmortem

Summary

A change in the D322 build triggered high database load by calling an unoptimized query. This resulted in disruptions across several deployments. The system was stabilized after recovery actions.

Timeline of Events

08/26/2025 06:47:00 UTC - Alert fires that our service latency is above target. 

08/26/2025 06:58:00 UTC - On-call engineer confirms that autoscaling is in progress. 

08/26/2025 07:00:00 UTC - Separate alert fires that our service database CPU utilization is high, and requests backlog starts to grow. 

08/26/2025 07:37:00 UTC - On-call engineer disables certain features of our service to alleviate load. 

08/26/2025 08:08:00 UTC - Status page is published. 

08/26/2025 08:45:00 UTC - Service database is scaled up. 

08/26/2025 08:50:00 UTC - On-call engineer monitors recovery of systems.

08/26/2025 10:40:00 UTC - All features of our service are reenabled. 

08/26/2025 11:44:00 UTC - Status page is updated to resolved. 

Details

Our D322 build started calling an unoptimized API endpoint on an internal system which led to high database load, latency and request timeouts. This degraded multiple deployments (DUO3, DUO47, DUO57).

To alleviate load, certain features were disabled temporarily. However, because Duo Directory depends on this feature, customers received errors until it was re-enabled. At the same time, the database was scaled up vertically to allow for quicker processing of backlogged requests.

Once the backlog was cleared, all features were reenabled and the incident was resolved.

Deep analysis was conducted to determine why the service database CPU utilization was higher than expected. A newly introduced unoptimized query was found and fixed promptly.

Furthermore, our teams have plans to implement more rigorous testing and review processes for new database queries that are added to our services.

Also, our teams will work to improve our performance testing capabilities in the development environment to allow more accurate simulation of production load. This will aid us in being able to better detect unoptimized queries before they are released.

Posted Aug 29, 2025 - 14:51 EDT

Resolved

The issue affecting User Management and SSO logins on Duo3, Duo47, and Duo57 has been fully resolved, and all services have been restored to normal functionality.


We will publish a RCA as soon as it is available. You can check back here for the latest information or subscribe for updates.
Posted Aug 26, 2025 - 07:43 EDT

Monitoring

Following the implementation of the fix, we are seeing signs of recovery. We will continue to actively monitor the results to ensure the issue is fully resolved.

Please refer to the status page for further updates.
Posted Aug 26, 2025 - 05:29 EDT

Identified

A fix has been deployed for the issue affecting User Management and SSO logins on Duo3, Duo47, and Duo57. We are monitoring performance and stability. Please refer to the status page for further updates.
Posted Aug 26, 2025 - 05:06 EDT

Update

We are continuing to investigate this issue.
Posted Aug 26, 2025 - 04:09 EDT

Investigating

We are currently investigating an issue impacting user management and Active Directory sync on selected deployments. Please refer to the status page for further updates.
Posted Aug 26, 2025 - 04:08 EDT
This incident affected: DUO3 (Admin Panel, SSO), DUO47 (Admin Panel, SSO), and DUO57 (Admin Panel, SSO).