On September 19, 2024, at around 14:08 ET, Duo's Engineering Team was alerted by monitoring that admins on select deployments were unable to retrieve their logs through the Admin Panel or Admin API. The root cause was identified as a configuration issue.
The issue was resolved on the same day by rolling back the configuration issue and releasing a permanent fix that addressed the error.
14:08 Duo Engineering is informed by monitoring of potential admin panel log delivery failures.
14:14 Duo Engineering creates an incident and escalates to the owning team.
14:15 Owning team begins to troubleshoot.
14:32 Root cause is identified.
15:03 Team decides to rollback the release to mitigate customer facing log delays.
15:12 Impacted deployments begin to recover.
15:23 Team introduces the permanent fix and releases to the impacted deployments.
Our Engineering team was working on enhancing our identity threat insights and decided to pre-configure regions that are planned for future improvements. However, the pre-configuration was based on an incorrect assumption about the underlying infrastructure, which had not yet been built out.
As a short-term solution, Duo rolled back to the previous stable version of code to mitigate customer impact. After admins were able to view their logs again, Duo released a permanent fix to the affected deployments to ensure long-term stability.