-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Which component are you using?:
/area vertical-pod-autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Similar to the in-place resource updates metrics that include a counter measuring the total number failed attempts
autoscaler/vertical-pod-autoscaler/pkg/utils/metrics/updater/updater.go
Lines 119 to 124 in 0ec850d
failedInPlaceUpdateAttempts = prometheus.NewCounterVec( | |
prometheus.CounterOpts{ | |
Namespace: metricsNamespace, | |
Name: "failed_in_place_update_attempts_total", | |
Help: "Number of failed attempts to update Pods in-place.", | |
}, []string{"vpa_size_log2", "reason"}, |
the vpa-updater should also measure the total number of failed Pods evictions attempts.
Describe the solution you'd like.:
Add a new counter metric, failed_eviction_attempts_total
, that gets increased at
autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go
Lines 302 to 305 in 0ec850d
evictErr := evictionLimiter.Evict(pod, vpa, u.eventRecorder) | |
if evictErr != nil { | |
klog.V(0).InfoS("Eviction failed", "error", evictErr, "pod", klog.KObj(pod)) | |
} else { |
via utility function from the updater
package, identical to the in-place updates approach
autoscaler/vertical-pod-autoscaler/pkg/utils/metrics/updater/updater.go
Lines 207 to 211 in 0ec850d
// RecordFailedInPlaceUpdate increases the counter of failed in-place update attempts by given VPA size and reason | |
func RecordFailedInPlaceUpdate(vpaSize int, reason string) { | |
log2 := metrics.GetVpaSizeLog2(vpaSize) | |
failedInPlaceUpdateAttempts.WithLabelValues(strconv.Itoa(log2), reason).Inc() | |
} |
Describe any alternative solutions you've considered.:
N/A
Additional context.:
The currently available Pods eviction metrics
- evictable_pods_total gauge: measures the current number of Pods to be evicted
- evicted_pods_total counter: measures the total number of Pods that have been successfully evicted by the
vpa-updater
are not sufficient to measure a success rate of the eviction operations. With the additional failed_eviction_attempts_total
counter, we'll be able to measure both outcomes and perform a
evicted_pods_total / (evicted_pods_total + failed_eviction_attempts_total)
to instrument an observability dashboard.