Skip to content

[VPA] Updater doesn't fallback to pod eviction in case it cannot decrease mem limits in-place #8434

@Art3mK

Description

@Art3mK

Which component are you using?:

/area vertical-pod-autoscaler

What version of the component are you using?:

Component version: 1.4.1

What k8s version are you using (kubectl version)?:

Server Version: v1.33.1-eks-595af52

What environment is this in?:

AWS EKS

What did you expect to happen?:

The docs here state:

VPA will fall back to pod recreation in the following scenarios:
...

BTW, it looks like k8s 1.33 doesn’t recognize a policy called PreferNoRestart.

So, for a pod with resizePolicy set to NotRequired for memory, I expect the updater to fall back to pod eviction after the in-recommendation-bounds-eviction-lifetime-threshold period is over (1h in my case)

What happened instead?:

After 24 hours (with my lifetime threshold set to 1 hour), I only see this:

updater.go:286] "In-place update failed" error="Pod \"vpa-recommender-8449858858-dz4gq\" is invalid: spec.containers[0].resources.limits[memory]: Forbidden: memory limits cannot be decreased unless resizePolicy is RestartContainer" pod="vpa/vpa-recommender-8449858858-dz4gq"

Yes, I’m experimenting and have VPA for VPA pods 😁

My pod has this resizePolicy:

58   │     resizePolicy:
59   │     - resourceName: cpu
60   │       restartPolicy: NotRequired
61   │     - resourceName: memory
62   │       restartPolicy: NotRequired

How to reproduce it (as minimally and precisely as possible):

  1. Start an overprovisioned deployment with a pause pod, with InPlaceOrRecreate VPA defined for it, while the VPA webhook is down. This way, VPA will want to reduce the resources on the pod.

  2. The deployment must:

  • be overprovisioned in resources
  • Have NotRequired as the restartPolicy for the container
  1. Observe that the updater complains and does nothing.

And yes, I understand this is an alpha feature. 😄

Use case:

I’m experimenting with VPA to see if it could help me optimize my dev cluster, where workloads spawn randomly in new namespaces. I'd like to somewhat quickly adjust resources for these workloads, cause those could be gone completely in 6-12 hours.

VPAs for these workloads are created by a separate controller, so for new VPAs recommendations are initially off. I’d like VPA to act on those workloads after 1–3 hours of collecting stats, be able to scale resources up in place if needed, but avoid constantly restarting containers when reducing memory (which happens if I set the RestartContainer policy for memory).

If I understand correctly, having the PreferNoRestart/NotRequired policy on a container’s memory resource will, by design, cause VPA to evict the pod in order to lower the memory limit. However, the new pod would then be subject to the in-recommendation-bounds-eviction-lifetime-threshold period, which means VPA will not restart pods frequently.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions