Summary
On January 7, 2025, Duo's Engineering Team mitigated a log delay for certain admin panel users. The root cause was identified an incorrect software deployment.
The issue was resolved on the same day by deploying the correct version of software for activity and telephony logs.
Details
During our regular patch cycle, an incorrect version of the software was inadvertently pushed to the production instances. The issue was quickly identified, and the team immediately rectified it by deploying the correct version to the newly patched instances. The response aimed to minimize potential impact and ensure that the correct functionality was restored without significant downtime.
Multiple teams worked together to ensure the integrity of the deployments and restore system stability. We promptly triaged the environment to identify affected deployments and pushed the correct software versions to each one. After deploying the updates, we conducted thorough verification to ensure both the correct software version and the appropriate patch version were in place.
The patching process is currently in its final evolution phase following our recent transition to Auto Scaling Groups (ASGs). Moving forward, each team will be responsible for managing their deployments patching to minimize external interference with their regular release schedules, which occur at least every two weeks and thus still adhere to our compliance guidelines on vulnerability management. This approach will streamline operations and ensure greater autonomy for teams while maintaining system stability.