-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Which component are you using?:
/area vertical-pod-autoscaler
What version of the component are you using?:
Component version: 1.4.1
What k8s version are you using (kubectl version
)?:
Server Version: v1.33.1-eks-595af52
What environment is this in?:
AWS EKS
What did you expect to happen?:
The docs here state:
VPA will fall back to pod recreation in the following scenarios:
...
- Update is in progress for more than 1 hour
- Memory limit downscaling is required with PreferNoRestart policy
BTW, it looks like k8s 1.33 doesn’t recognize a policy called PreferNoRestart.
So, for a pod with resizePolicy
set to NotRequired
for memory, I expect the updater to fall back to pod eviction after the in-recommendation-bounds-eviction-lifetime-threshold period is over (1h in my case)
What happened instead?:
After 24 hours (with my lifetime threshold set to 1 hour), I only see this:
updater.go:286] "In-place update failed" error="Pod \"vpa-recommender-8449858858-dz4gq\" is invalid: spec.containers[0].resources.limits[memory]: Forbidden: memory limits cannot be decreased unless resizePolicy is RestartContainer" pod="vpa/vpa-recommender-8449858858-dz4gq"
Yes, I’m experimenting and have VPA for VPA pods 😁
My pod has this resizePolicy:
58 │ resizePolicy:
59 │ - resourceName: cpu
60 │ restartPolicy: NotRequired
61 │ - resourceName: memory
62 │ restartPolicy: NotRequired
How to reproduce it (as minimally and precisely as possible):
-
Start an overprovisioned deployment with a pause pod, with InPlaceOrRecreate VPA defined for it, while the VPA webhook is down. This way, VPA will want to reduce the resources on the pod.
-
The deployment must:
- be overprovisioned in resources
- Have NotRequired as the restartPolicy for the container
- Observe that the updater complains and does nothing.
And yes, I understand this is an alpha feature. 😄
Use case:
I’m experimenting with VPA to see if it could help me optimize my dev cluster, where workloads spawn randomly in new namespaces. I'd like to somewhat quickly adjust resources for these workloads, cause those could be gone completely in 6-12 hours.
VPAs for these workloads are created by a separate controller, so for new VPAs recommendations are initially off. I’d like VPA to act on those workloads after 1–3 hours of collecting stats, be able to scale resources up in place if needed, but avoid constantly restarting containers when reducing memory (which happens if I set the RestartContainer policy for memory).
If I understand correctly, having the PreferNoRestart/NotRequired policy on a container’s memory resource will, by design, cause VPA to evict the pod in order to lower the memory limit. However, the new pod would then be subject to the in-recommendation-bounds-eviction-lifetime-threshold period, which means VPA will not restart pods frequently.