Skip to content

Consistently propagate down timeouts from MD => MS => Machines #10753

Open
@sbueringer

Description

@sbueringer

Goal

Goal of this issue is to consistently propagate down timeouts (NodeDrainTimeout, NodeDeletionTimeout, ...) from MDs to MSs to Machines. This is desirable so that users can still change timeouts even if a Machine is e.g. stuck in draining.

We had a first PR which ensures a MachineSet propagates down the timeouts to Machines which are in deleting: #10589

But there are a few other cases, as described here: #10589 (inlining below for convenience)

The following specifically focuses on cases where Machines are deleted by the MS controller.

Case 1. MD is deleted

The following happens:

  • MD goes away
  • ownerRef triggers MS deletion
  • MS goes away
  • ownerRef triggers Machine deletion

=> The MS will already be gone when the deletionTimestamp is set on the Machines. In this case folks would have to modify the timestamps on each Machine individually. Because the MS doesn't exist anymore it's not possible to propagate down timeouts from the MS to Machines

Case 2. MD is scaled down to 0

The following happens:

  • MD scales down MS to 0
  • MS deletes Machine

This use case was addressed by: #10589

Case 3. MD rollout

The following happens:

  • Someone updates the MD (e.g. bump the Kubernetes version)
  • MD creates a new MS and scales it up
  • In parallel MD scales down the old MS to 0

=> In this scenario today the MD controller does not propagate the timeouts from MD to all MS (only to the new/current one, not to the old ones). So the Machines of the old MS won't get new timeouts set in the MD

Implementation

To address all scenarios I would propose to always propagate timeouts from MD => MS => Machine. To make that happen we have to implement the following:

Follow-up:

  • We should also check other objects like Cluster (topology), KCP, ...

Metadata

Metadata

Assignees

Labels

area/machinedeploymentIssues or PRs related to machinedeploymentsarea/machinesetIssues or PRs related to machinesetshelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions