On December 6, 2023, at around 19:37 EST, Duo's Engineering Team was alerted by monitoring that DUO61 was experiencing network egress downtime. The root cause was maintenance that caused a change in networking configuration.
The issue was resolved by reverting our networking change and following we were able to observe that there was minimal customer authentication downtime.
2023-12-06
19:21 Duo Site Reliability Engineering (SRE) completed the scheduled maintenance which deployed an unexpected configuration change in our networking.
19:22 Duo SRE detects that network traffic is down for DUO61
19:22 Duo SRE starts investigation, hampered by the network changes
20:02 Duo SRE rolls back changes to networking that caused the issue
20:18 Duo SRE starts to see recovery for DUO61
20:25 Duo SRE confirms full recovery in the deployment
Duo SRE completes maintenance work without expected downtime in off peak times for a deployment. DUO61 had some off peak maintenance work scheduled to update and harden the security posture of the deployment by updating our load balancer layer. During the rollout a configuration error was rolled out to the deployment causing network egress traffic to drop.
By rolling back the configuration, Duo SRE was able to restore traffic out from the deployment. We then were able to see that authentication traffic had minimal impact from the outage.
Duo SRE is addressing the individual bug that caused the incorrect configuration to be rolled out to the egress layer. In addition Duo SRE is currently in the process of re-architecting the egress layer to be more resilient during these types of events.
Note: You can find your Duo deployment’s ID and sign up for updates via the StatusPage by following the instructions in this knowledge base article.