Description
What would you like to be added (User Story)?
I need a feature to restart one machine without restarting all nodes.
Currently, the machine-deployment controller only provides cluster rolling update operation.
We can recreate one machine by removing its machine resource, but that operation temporarily reduces the total computing capacity of the entire cluster.
Sometimes, any node will become unstable, and cluster admins will restart/recreate that node to resolve that problem.
We don’t want to restart/recreate all nodes at once because it takes more time to complete and makes application performance unstable.
Detailed Description
Add a way to add one machine before actually terminating the machine.
We need a means to remove one machine after running a new same-size machine.
Our idea is to define a new annotation like cluster.x-k8s.io/refresh
that refreshes one machine if that annotation adds machine resources.
https://cluster-api.sigs.k8s.io/reference/labels_and_annotations
Anything else you would like to add?
We can also achieve the goal by having the following logic on our side, without introducing additional logic to the Cluster API side.
- Add one to replicas for a machine deployment resource.
- Stop machineDeployment controller using
cluster.x-k8s.io/paused
labels. - Delete a machine that contains any problem with kubectl delete.
- Decrease one to replicas.
- Clean up
cluster.x-k8s.io/paused
labels.
It may be related to this request: #1808
I’ll write an enhancement proposal if you think that is needed.
Label(s) to be applied
/kind feature
/area machine