Kubespray overwrites NVIDIA GPU operator containerd configuration

### What happened?

When the NVIDIA GPU operator is installed on a cluster, the container toolkit component modifies the containerd config to target the NVIDIA runtime.

If Kubespray is then run with a GPU host in the play, e.g. for an upgrade, then the containerd config is overwritten and the NVIDIA runtime definitions are removed. This results in pods failing to schedule on the GPU nodes.

### What did you expect to happen?

This is what I expected to happen, but it is not desirable behaviour IMHO.

### How can we reproduce it (as minimally and precisely as possible)?

Deploy a Kubespray cluster with GPU nodes, install the NVIDIA GPU operator and then run Kubespray again.

### OS

Ubuntu 22

### Version of Ansible

ansible [core 2.16.14]
  config file = /Users/mattp/Projects/nks-region/k8s-infra-ndg-region/ansible.cfg
  configured module search path = ['/Users/mattp/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/mattp/.pyenv/versions/3.12.9/envs/kubespray-test/lib/python3.12/site-packages/ansible
  ansible collection location = /Users/mattp/Projects/nks-region/k8s-infra-ndg-region/.ansible/collections
  executable location = /Users/mattp/.pyenv/versions/kubespray-test/bin/ansible
  python version = 3.12.9 (main, Apr 10 2025, 11:21:50) [Clang 17.0.0 (clang-1700.0.13.3)] (/Users/mattp/.pyenv/versions/3.12.9/envs/kubespray-test/bin/python)
  jinja version = 3.1.6
  libyaml = True

### Version of Python

Python 3.12.9

### Version of Kubespray (commit)

v2.28.0

### Network plugin used

cilium

### Full inventory with variables

N/A

### Command used to invoke ansible

ansible-playbook kubernetes_sigs.kubespray.cluster

### Output of ansible run

N/A

### Anything else we need to know

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kubespray overwrites NVIDIA GPU operator containerd configuration #12277

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kubespray overwrites NVIDIA GPU operator containerd configuration #12277

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions