Multiple Deployments] Admin Panel log and API log delivery and retrieval failures
Incident Report for Duo
Postmortem

Summary

On September 19, 2024, at around 14:08 ET, Duo's Engineering Team was alerted by monitoring that admins on select deployments were unable to retrieve their logs through the Admin Panel or Admin API. The root cause was identified as a configuration issue.

The issue was resolved on the same day by rolling back the configuration issue and releasing a permanent fix that addressed the error.

Timeline of Events ET

14:08 Duo Engineering is informed by monitoring of potential admin panel log delivery failures.

14:14 Duo Engineering creates an incident and escalates to the owning team.

14:15 Owning team begins to troubleshoot.

14:32 Root cause is identified. 

15:03 Team decides to rollback the release to mitigate customer facing log delays.

15:12 Impacted deployments begin to recover.

15:23 Team introduces the permanent fix and releases to the impacted deployments.

Details

Our Engineering team was working on enhancing our identity threat insights and decided to pre-configure regions that are planned for future improvements. However, the pre-configuration was based on an incorrect assumption about the underlying infrastructure, which had not yet been built out.

As a short-term solution, Duo rolled back to the previous stable version of code to mitigate customer impact. After admins were able to view their logs again, Duo released a permanent fix to the affected deployments to ensure long-term stability.

Posted Sep 25, 2024 - 14:12 EDT

Resolved
The issue with Reports and log API performance and consistency have been resolved. An RCA will be provided as soon as it is available.
Posted Sep 19, 2024 - 16:47 EDT
Update
We are continuing to monitor for any further issues.
Posted Sep 19, 2024 - 15:56 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 19, 2024 - 15:38 EDT
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 19, 2024 - 15:26 EDT
Investigating
We are investigating an issue causing some reports and/or log APIs to fail to load or show incorrect data across multiple deployments. Please check back for further updates.
Posted Sep 19, 2024 - 15:22 EDT
This incident affected: DUO5 (Admin Panel), DUO26 (Admin Panel), DUO34 (Admin Panel), DUO43 (Admin Panel), DUO51 (Admin Panel), DUO68 (Admin Panel), DUO69 (Admin Panel), and DUO81 (Admin Panel).