DUO67: Issues with Authentication and SSO Logs being displayed in the Admin Panel and via Admin API
Incident Report for Duo
Postmortem

Delay in retrieving Authentication and SSO logs

Incident Report - 2023-09-11

Summary

On September 11, 2023, at around 01:27 EDT, Duo's Engineering Team was alerted by our internal monitoring systems that Duo’s data pipeline systems were experiencing high latency in ingesting authentication and SSO logs. Soon after the alert, there were delays in data arriving to be queried for authentication and SSO logs from the Duo Admin Panel and Duo Admin APIs. The root cause of the latency was identified as expired certificates on the components of our internal data pipeline.

The issue was resolved on the next day by renewing the certificates within our data pipeline which allowed the system to recover fully.  No logs were lost as part of this incident as our recovery mechanisms had kicked in to store the unprocessed logs.

Deployments Impacted

  • DUO67

Timeline of Events EDT

2023-09-11 13:27 - Duo Engineering team is alerted of an issue with log ingestion in the data pipeline in DUO67.  

2023-09-11 13:34 - Duo Engineering team begins troubleshooting

2023-09-11 16:46 - Duo Engineering team identifies the root cause and starts working on a fix

2023-09-12 04:29 - Duo Engineering team successfully renews all required certificates and begins restarting the service experiencing issues.  

2023-09-12 05:01 - The system was restarted successfully and we started seeing logs flowing through the data pipeline as expected 

2023-09-12 05:30 - The StatusPage is updated to Monitoring.

2023-09-12 06:27 - The StatusPage is updated to Resolved. 

Details

Duo Engineering completed the scheduled patching of services that constitute our data pipeline which serves the authentication and SSO logs. Once patching was completed, our monitoring system alerted us that the system was unable to ingest logs at the expected pace and the ingestion latency was high. Shortly after the alert, we started investigating the issue.

This issue affected the retrieval of authentication and SSO logs from the Duo Admin Panel and Admin APIs. We then determined that the degraded state was due to the expired certificates on the components of our internal data pipeline. We renewed the required certificates for all the components of our data pipeline to resolve issues.

Once the certificates were renewed and the services were restarted, the ingest latency went back to the normal and expected levels. The alerts from our monitoring also stopped and the incident was eventually marked as resolved after careful consideration from our teams. Note, this did not affect authentications in any way.

What is Duo doing to prevent this in the future?

Duo Engineering has conducted a retrospective to determine how we can stop any certificate related issues in the future. We will be implementing automation to our certificate renewal process and increasing observability and alerting for certificates that are close to their expiration to give us more than enough time to act on them. In addition, we have identified a number of measures to be integrated into our response runbooks to decrease our incident response time.

Note: You can find your Duo deployment’s ID and sign up for updates via the StatusPage by following the instructions in this knowledge base article.

Posted Sep 19, 2023 - 19:35 EDT

Resolved
The issue affecting delayed authentication logs and SSO log entries in the Duo admin panel or when retrieved by the Admin API on our deployments is fully resolved and all services are now fully functional.

We will be posting a root-cause analysis (RCA) here once our engineering team has finished its thorough investigation of the issue.

Please make sure to check back or subscribe to be notified when the RCA is posted.
Posted Sep 12, 2023 - 06:27 EDT
Monitoring
A fix has been found and we are currently testing it to ensure reliability.

We will continue to monitor the issue and will post any updates when the incident is considered fully resolved.

Please check back here or subscribe here for further updates.
Posted Sep 12, 2023 - 05:30 EDT
Identified
We have identified the root cause resulting in the delay in displaying Authentication logs and SSO logs in both the Admin Panel or pulled via API. We will update with more information once know more.

Authentication continues to be unaffected at this time.

Please check back here or subscribe for updates.
Posted Sep 11, 2023 - 16:47 EDT
Investigating
We are currently investigating an issue that is affecting Authentication Log and SSO Log entries being displayed in the Duo Admin Panel, or when retrieved via the Admin API. We'll provide more updates as soon as we have more information. Authentication is unaffected at this time.
Posted Sep 11, 2023 - 14:26 EDT
This incident affected: DUO67 (Admin Panel).