Skip to content

Kubespray overwrites NVIDIA GPU operator containerd configuration #12277

Open
@mkjpryor

Description

@mkjpryor

What happened?

When the NVIDIA GPU operator is installed on a cluster, the container toolkit component modifies the containerd config to target the NVIDIA runtime.

If Kubespray is then run with a GPU host in the play, e.g. for an upgrade, then the containerd config is overwritten and the NVIDIA runtime definitions are removed. This results in pods failing to schedule on the GPU nodes.

What did you expect to happen?

This is what I expected to happen, but it is not desirable behaviour IMHO.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a Kubespray cluster with GPU nodes, install the NVIDIA GPU operator and then run Kubespray again.

OS

Ubuntu 22

Version of Ansible

ansible [core 2.16.14]
config file = /Users/mattp/Projects/nks-region/k8s-infra-ndg-region/ansible.cfg
configured module search path = ['/Users/mattp/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /Users/mattp/.pyenv/versions/3.12.9/envs/kubespray-test/lib/python3.12/site-packages/ansible
ansible collection location = /Users/mattp/Projects/nks-region/k8s-infra-ndg-region/.ansible/collections
executable location = /Users/mattp/.pyenv/versions/kubespray-test/bin/ansible
python version = 3.12.9 (main, Apr 10 2025, 11:21:50) [Clang 17.0.0 (clang-1700.0.13.3)] (/Users/mattp/.pyenv/versions/3.12.9/envs/kubespray-test/bin/python)
jinja version = 3.1.6
libyaml = True

Version of Python

Python 3.12.9

Version of Kubespray (commit)

v2.28.0

Network plugin used

cilium

Full inventory with variables

N/A

Command used to invoke ansible

ansible-playbook kubernetes_sigs.kubespray.cluster

Output of ansible run

N/A

Anything else we need to know

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Ubuntu 22kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions