Resolved
All impacted models and instances have recovered and we are operating normally. Thank you for your patience.
Monitoring
Most models have recovered, investigating straggling models that are failing to start.
Monitoring
New model instances were unable to boot between 12:41 and 13:14 UTC. Existing models were able to serve traffic.
The underlying issue has been resolved.
While the core of the incident was resolved we are waiting for instances to clear the error state. We will continue to monitor while the system works through the backlog of booting instances.