-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add flag to determine whether containerd config is overwritten #12278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add flag to determine whether containerd config is overwritten #12278
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mkjpryor The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @mkjpryor. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
Looks like this tested down OK. Is there anything else you need from me? |
Thanks @mkjpryor I appreciate the solution you provided. Personally, I think this parameter is reasonable, as I couldn’t think of a better way to address this issue. However, there’s one scenario that may need clarification: /lgtm |
Hum, it seems to me that additional/ multiple runtimes should be handled by configuring the container engine with different runtimes, and having the corresponding runtimeclass object defined. See https://kubernetes.io/docs/concepts/containers/runtime-class/
IMO, the GPU NVIDIA stuff relying on overwriting the containerd config is a broken model, and I'm very much against introducing clutches to support broken models.
I haven't delved deep into this, so it's possible I'm missing a good reason, but it's gonna be a hard sell.
|
I agree with you that this is not an ideal solution, but as @yankay says it is the only reasonable solution I can think of that I can implement right now to stop kubespray and the NVIDIA GPU operator fighting over my containerd config. In the docs you pointed to, it says that The major problem here is actually containerd's lack of support for a I'm open to other solutions if you can think of any. I can't. The NVIDIA GPU operator is a very widely used piece of software so I am surprised to be the only one hitting this. Maybe nobody else uses it with kubespray 🤷♂️ |
In any case, I think a flag whose default value results in no change in behaviour is a reasonable compromise? |
You are correct that if a user sets this flag in their group vars, then if they really need kubespray to update the containerd config they will need to run with |
agree with your opinion. This PR enhances the configurability of Kubespray without any negative impacts, so it can be merged. Additionally, we can continue to explore whether there are more elegant solutions in the future. |
Ok, I see the rationale. I'm still not a very big fan, especially because this opens the door for subtle bugs where the containerd config is not updated with new versions, that kind of things, which would not be a breaking change with normal behavior but could be if we don't update the containerd config. What's the cri-o approach on this, a config.d/ scheme like you mentioned above ? Another question regarding the nvidia operator specifically: is the runtime and runtime class available to implement independently ? |
Also, following my previous message, could
`containerd_additional_runtimes` currently be used for solving this ?
https://github.com/kubernetes-sigs/kubespray/blob/b04ceba89b9094275cd913e24ec7f43eb0f17cf0/roles/container-engine/containerd/templates/config.toml.j2#L43
|
In all honesty, I'd rather use the NVIDIA official approach for similar reasons to the ones that you expressed as concerns for this patch. If the NVIDIA GPU operator changes the installation and I have fixed runtimes defined in Unfortunately, containerd does not support a config.d style config merging properly right now, so it is not really an option for the NVIDIA GPU operator to use that. The NVIDIA runtime installation process reads the current default runtime and bases the NVIDIA runtimes off that with the required changes, e.g. to point at different runtime binaries. A stronger decoupling of the installation of the container engine itself and the corresponding runtime config could work, but I'm not sure what that would look like that is much different to what this patch does. This patch introduces a new option that you specifically opt in to that basically says to kubespray that I, the operator, am taking responsibility for ensuring the containerd config is correct because I know other things need to change it. I'm not sure how much better we can do with the current state of the containerd config merging. Of course, if a larger refactoring in the future, or changes to containerd make a config.d approach feasible, we should revisit this workaround. |
I haven't forgot this, I'm just a bit swamped at work right now ; I'll
come back to this soonish.
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds a flag to opt out of overriding the containerd configuration.
Which issue(s) this PR fixes:
Fixes #12277
Special notes for your reviewer:
Does this PR introduce a user-facing change?: