CA potential for skipped node template info when a node group contains only non-ready nodes

(this issue originated from a [discussion at the 2025-07-28 SIG Autoscaling office hours](https://docs.google.com/document/d/1RvhQAEIrVLHbyNnuaT99-6u9ZUMp7BfkPupT2LAZK7w/edit?tab=t.0#bookmark=id.7if3zqt4u07o))

**Which component are you using?**:

/area cluster-autoscaler

**What version of the component are you using?**:

Component version: 1.33

**What k8s version are you using (`kubectl version`)?**:

Server Version: v1.31.2

**What environment is this in?**:

cluster-api kubemark and aws providers

**What did you expect to happen?**:

with all the nodes in a node group cordoned, and no scale-from-zero information provided, i expect the autoscaler to utilize an unschedulable node as a template. as described in this code: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go#L160-L178

**What happened instead?**:

the autoscaler did not make new nodes and produced log messages describing that no node could fit the workload.

**How to reproduce it (as minimally and precisely as possible)**:

1. create a cluster-api cluster, with one MachineDeployment configured for autoscaling (do not add scale from zero information)
2. set the minimum node group size to 1 for the MachineDeployment
3. increase replicas to 1 for the MachineDeployment
4. cordon the node associated with the one Machine in the MachineDeployment, eg `kubectl cordon <node>`
5. create a workload that targets nodes from the MachineDeployment (eg using node selectors)

**Anything else we need to know?**:

it appears as though the autoscaler will remove an unschedulable nodes from the list of nodes to be processed during a scale up loop. this means that there are no nodes which could be sanitized of taints and spec.unschedulable field. this may be by design, but we should evaluate to determine if the unschedulable nodes should be removed from the list before processing.

ready nodes collected here, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/static_autoscaler.go#L289
using this function, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/static_autoscaler.go#L985-L1006

the ready nodes list is passed in to the process function here, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/static_autoscaler.go#L356
and would be included from this clause, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go#L161-L178
using this function to sanitize the node, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go#L183-L195


it seems like we need to determine if, and when, this functionality changed, and then determine if the node list to the `Process` function should be changed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CA potential for skipped node template info when a node group contains only non-ready nodes #8380

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CA potential for skipped node template info when a node group contains only non-ready nodes #8380

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions