Consistently propagate down timeouts from MD => MS => Machines

### Goal

Goal of this issue is to consistently propagate down timeouts (NodeDrainTimeout, NodeDeletionTimeout, ...) from MDs to MSs to Machines. This is desirable so that users can still change timeouts even if a Machine is e.g. stuck in draining.

We had a first PR which ensures a MachineSet propagates down the timeouts to Machines which are in deleting: https://github.com/kubernetes-sigs/cluster-api/pull/10589

But there are a few other cases, as described here: https://github.com/kubernetes-sigs/cluster-api/pull/10589 (inlining below for convenience)

The following specifically focuses on cases where Machines are deleted by the MS controller.

### Case 1. MD is deleted

The following happens:
* MD goes away
* ownerRef triggers MS deletion
* MS goes away
* ownerRef triggers Machine deletion

=> The MS will already be gone when the deletionTimestamp is set on the Machines. In this case folks would have to modify the timestamps on each Machine individually. Because the MS doesn't exist anymore it's not possible to propagate down timeouts from the MS to Machines

### Case 2. MD is scaled down to 0

The following happens:
* MD scales down MS to 0
* MS deletes Machine

This use case was addressed by: https://github.com/kubernetes-sigs/cluster-api/pull/10589


### Case 3. MD rollout 

The following happens:
* Someone updates the MD (e.g. bump the Kubernetes version)
* MD creates a new MS and scales it up
* In parallel MD scales down the old MS to 0

=> In this scenario today the MD controller does not propagate the timeouts from MD to all MS (only to the new/current one, not to the old ones). So the Machines of the old MS won't get new timeouts set in the MD

### Implementation


To address all scenarios I would propose to always propagate timeouts from MD => MS => Machine. To make that happen we have to implement the following:
* [ ] Ensure during MD deletion, MD & MS objects stay around until all Machines are deleted: https://github.com/kubernetes-sigs/cluster-api/issues/10710
  * [ ] PR: https://github.com/kubernetes-sigs/cluster-api/pull/10791
* [ ] Ensure timeouts are always propagated from MD to all MachineSets to all Machines
  * Even if a MD, MS or Machine is in deleting (also both in regular reconcile & reconcileDelete)
  * Even if a MS is not the "current" MS

Follow-up:
* [ ] We should also check other objects like Cluster (topology), KCP, ...




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistently propagate down timeouts from MD => MS => Machines #10753

Goal

Case 1. MD is deleted

Case 2. MD is scaled down to 0

Case 3. MD rollout

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consistently propagate down timeouts from MD => MS => Machines #10753

Description

Goal

Case 1. MD is deleted

Case 2. MD is scaled down to 0

Case 3. MD rollout

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions