Multiple Deployments: Admin Panel authentication log reporting delay
Incident Report for Duo
Postmortem

Update:

This message is a follow-up to the postmortem published August 10 regarding the delay in Authentication Log reporting that occurred on August 7 from 9:35 AM Eastern (1335 UTC) to 3:45 PM Eastern (1945 UTC).

The following information is for customers using a SIEM or other security tooling via the Duo Admin API. These customers will need to perform manual steps to backfill logs for the affected time period into their SIEM or other system.

If you are not moving Duo logs into an external system or do not wish to backfill your logs into your system, you may disregard this message.

To retrieve your logs, you will need to run a script to download the data from Duo and then import the records into your SIEM or other system.

Step 1: Download the log export script from Duo.

From our Github project page, go to Code > Download ZIP to download the entire project folder.

This folder will contain both the export script, authlog_export.py, and a requirements.txt file that will install all the dependencies required for the script to run.

Step 2: Run the script to export your logs.

  1. Run pip install -r requirements.txt to install the duo_client dependency.
  2. Execute the script with python authlog_export.py or python3 authlog_export.py

    1. If you plan to import the downloaded data to Splunk, add --splunk to your command line. This will add Splunk-specific fields to the downloaded data. By default, data will be retrieved using version 1 of the API, which is used by the Splunk integration and the legacy third-party Duo Log Grabber tool.
    2. To retrieve data from version 2 of the API, which is in a different format, please add --version=2 to your command line.
  3. Input your IKEY, SKEY, and host to connect to your Admin API integration.

  4. Specify the directory where you wish to write the logs.

  5. Provide the following start and end time values to fetch the logs for the affected time period. Both values are in UTC and correspond to the incident period of about 9:30 a.m. to 3:45 p.m. ET.

    1. Start time value: 1596807000000
    2. End time value: 1596829500000

Step 3: Import the downloaded data into your SIEM or other external system.

Follow your usual workflow for manually importing data into your system. Here is a sample set of instructions for Splunk, which uses version 1 of the API.

  1. Log into the Splunk administrative UI and go to Settings > Data > Source Types.
  2. Make sure you have JSON as a source type. If not, create a new JSON source type.
  3. Go to Settings > Add Data.
  4. Click on Upload and then drag and drop the exported Duo authlog_data.json file.
  5. Click on Next, and then expand the Timestamp settings in the left sidebar to make the following selections:

    1. Set the extraction mode to Advanced.
    2. Set the timezone to default.
    3. Set the timestamp format to %s and the timestamp field to timestamp.
  6. Click on Next and then from the index dropdown, select Duo as the index to import the data to.

  7. Review your settings and then submit to import the data.

Admin Panel Authentication Log reporting delay - DUO9, DUO17, DUO22, DUO39, DUO42, DUO45, DUO49, DUO55, DUO56, DUO58, DUO61, DUO62, DUO63, DUO64, DUO65

Incident Report - 2020/08/07

Summary:

From 9:35 AM Eastern (1335 UTC) to 3:45 PM Eastern (1945 UTC) on August 7, 2020, new authentication logs were unavailable in the Duo Admin Panel and to customer monitoring workflows, such as automated SIEM logging consumption that relies on retrieving Authentication Logs in near-real-time from Duo’s APIs. From 3:45 PM Eastern (1945 UTC) new logs became available. At 8:49 PM Eastern (0049 UTC August 8, 2020) all data was available. This issue has been resolved; no data was lost due to this incident.

Details:

At 11:35 AM Eastern, Duo’s Engineering Team was notified of customer reports that new authentications had stopped appearing in the Authentication Log in the Admin Panel. The team immediately began investigating the issue.

At 12:39 PM, the team completed its initial investigation and determined that authentications were flowing normally but were not being fully processed through Duo’s logging platform since 9:35 AM Eastern. The team also determined that the log data itself had not been lost. The team then began troubleshooting the log ingestion process.

At 1:30 PM, Duo’s Engineering Team determined that one node in the logging cluster had zero free space remaining, which prevented all new writes to the cluster. The cluster did not properly automatically balance the free space available, even though other nodes in the cluster had significant amounts of free space. Alerts had not fired to inform the team that a single node had no free disk space. The team then began working to add space to the affected node without affecting the rest of the cluster.

At 3:45 PM, the team successfully repaired the cluster and log ingestion resumed normally.

At 4:00 PM, the team began to identify a process to backfill missing logs from earlier in the day.

At 5:45 PM, the team began executing the backfill process and monitoring the logging infrastructure to ensure the process was working.

At 8:49 PM, the team confirmed that all logs had been backfilled and no data had been lost.

Customers who rely on automatic processes to populate a SIEM with Duo authentication logs will need to execute manual steps if they wish to backfill data into their SIEMs. Affected customers will receive those instructions via email before August 12.

In response to this incident, Duo’s Engineering Team is investigating with our partners as to why the cluster’s configuration prevented it from automatically balancing the free space of the cluster. Since the incident the team has improved our monitoring surrounding the log ingestion pipeline, including multiple alerts to monitor free space so that we can more quickly detect similar issues. The team has also improved our backfill automation so that backfill can be started more quickly if needed. Duo’s Engineering team is committed to ensuring that authentication logs are highly available.

Posted Aug 11, 2020 - 17:36 EDT

Resolved
Re-ingestion of older logs in the Duo Admin Panel is complete and we can confirm no logs were lost. The platform is now performing normally. A full RCA will be published on Monday.
Posted Aug 08, 2020 - 07:43 EDT
Identified
We are currently working on the re-ingestion of older logs in the Duo Admin Panel and anticipate this to be complete by approximately 5 PM EST tomorrow. No logs have been lost.

Please check back here or subscribe to updates for any changes.
Posted Aug 07, 2020 - 18:45 EDT
Update
As of 16:00 EST, we have resolved the delay in displaying new authentication logs in the Duo Admin Panel. We are working to address re-ingestion of older logs. No logs have been lost. We will provide an update when we have a specific timeline.
Posted Aug 07, 2020 - 17:15 EDT
Update
We are continuing to investigate this issue.
Posted Aug 07, 2020 - 15:14 EDT
Update
We have identified the cause of the issue delaying the display of authentication logs in the Duo Admin Panel on multiple deployments. We are working to resume ingestion of new logs first before we address re-ingestion of older logs. No logs have been lost. We will provide an update when we have a specific timeline.
Posted Aug 07, 2020 - 15:12 EDT
Investigating
We are currently investigating an issue, which began at about 9:30 a.m. EST, that is causing delays in the display of authentication logs in the Duo Admin Panel on multiple deployments. We are working to correct the issue as soon as possible.

Please check back here or subscribe to updates for any changes.
Posted Aug 07, 2020 - 13:06 EDT
This incident affected: DUO17 (Admin Panel), DUO22 (Admin Panel), DUO39 (Admin Panel), DUO42 (Admin Panel), DUO45 (Admin Panel), DUO9 (Admin Panel), DUO49 (Admin Panel), DUO55 (Admin Panel), DUO56 (Admin Panel), DUO58 (Admin Panel), DUO61 (Admin Panel), DUO62 (Admin Panel), DUO63 (Admin Panel), DUO64 (Admin Panel), and DUO65 (Admin Panel).