On December 26, 2024, OpenAI faced a series of service disruptions that affected its text-to-image platform, Sora, ChatGPT, and related services.
The outages occurred in several phases throughout the day, impacting video creation, the Sora editor, and API access.
The issue was eventually traced to a power incident at a Microsoft data center, which had far-reaching effects on both Microsoft’s and OpenAI’s infrastructure.
Sora Video Creation Errors
On the evening of December 26, users of OpenAI’s text-to-image platform, Sora, reported significant issues with video generation.
From 7:45 PM to 10:05 PM PST, high error rates caused users to experience problems when trying to queue new video creations. OpenAI responded quickly, rolling out a fix that resolved the issue shortly after.
Editor Issues on Sora
In addition to video generation errors, Sora’s editor also encountered problems around 9:20 PM PST on the same day. Users reported intermittent issues, which OpenAI acknowledged.
The company provided updates throughout the evening, assuring users that it was actively investigating and working to restore service.
ChatGPT and API Disruptions
Earlier in the day, around 10:40 AM PST, OpenAI faced a widespread service disruption that affected ChatGPT, its API, and Sora. Users were unable to access ChatGPT, with some reporting “internal server errors” when trying to interact with the chatbot.
Recovery efforts were launched promptly, and by 8:16 PM PST, full service restoration for ChatGPT was achieved.
Microsoft’s Power Incident and Shared Infrastructure
OpenAI’s outages were linked to a power incident at a Microsoft data center in South Central US. The power issue began around the same time as OpenAI’s service disruptions and was identified as the root cause of the outage.
Microsoft’s services, including Microsoft 365, Azure, and Xbox cloud gaming, were also affected by the power incident.
Microsoft confirmed that the outage stemmed from power issues at its AZ03 data center, leading to storage latency, timeouts, and HTTP 500 errors across several services.
The company restored power by 5:00 PM ET, with its services gradually recovering thereafter.
Potential Link Between Microsoft and OpenAI Outages
Although OpenAI did not officially confirm that Microsoft’s data center issues directly caused its outage, the timing and shared infrastructure between the two companies suggest a connection.
This incident highlights the potential risks and complexities of relying on third-party cloud providers for critical infrastructure.
OpenAI’s Response and Service Recovery
OpenAI was proactive in updating its status page and maintaining transparency throughout the disruptions.
- 1:30 PM ET: Down Detector reported a significant spike in service issues, especially with ChatGPT becoming largely inaccessible.
- 2:00 PM ET: OpenAI acknowledged high error rates affecting ChatGPT, the API service, and Sora.
- 5:00 PM ET: OpenAI confirmed that Sora was fully operational, and API services were beginning to recover.
- 6:15 PM ET: A final update indicated that ChatGPT was largely recovered, with some minor issues still being addressed.
Previous Service Outages
This incident is not the first service disruption OpenAI has faced. In June 2024, OpenAI experienced a widespread AI service outage, preventing many users from accessing ChatGPT for several hours.
Just days after the release of Sora to ChatGPT Plus subscribers earlier in December, both ChatGPT and Sora experienced another prolonged outage.
Root Cause Analysis
In response to the disruption, OpenAI has committed to conducting a thorough root-cause analysis. The company plans to release the results of this investigation in future updates.
By identifying the underlying causes of these disruptions, OpenAI aims to take steps to prevent similar incidents from occurring in the future.
These outages serve as a reminder of the challenges and complexities of maintaining high-reliability services, particularly when dependent on external infrastructure.