Delayed boots and scale out

Resolved·Degraded performance

The in-process rollback of a node configuration change appears to have introduced enough pressure on autoscaling that we were seeing increased delays for cold boots and scale outs. Once we deleted many of the crashing pods, we saw queues rapidly drop.

Mon, May 12, 2025, 05:37 PM

(2 weeks ago)

·

Affected components

May 12, 2025, 05:10 PM

05:37 PM

Updates

Resolved

The in-process rollback of a node configuration change appears to have introduced enough pressure on autoscaling that we were seeing increased delays for cold boots and scale outs. Once we deleted many of the crashing pods, we saw queues rapidly drop.

Mon, May 12, 2025, 05:37 PM

Identified

We are seeing a large number of pods in Pending while waiting for a node refresh to complete. Clearing the pods stuck in CrashLoopBackOff appears to have helped clear the blockage.

Mon, May 12, 2025, 05:10 PM(26 minutes earlier)