Skip to content

metrics-server addon-resizer (nanny) experiences CPU throttling in EKS cluster #8409

@rajeevpnair

Description

@rajeevpnair

Which component are you using?: addon-resizer

What version of the component are you using?: autoscaling/addon-resizer:1.8.21

Component version:

What k8s version are you using (kubectl version)?: v1.32.5-eks-5d4a308

kubectl version Output
$ kubectl version

What environment is this in?: PRD

What did you expect to happen?: Metrics-server nanny container auto-scale metrics-server container smoothly based on the number of nodes in the cluster.

What happened instead?: Nanny container is CPU throttling

How to reproduce it (as minimally and precisely as possible): Install metrics-server chart version 3.12.2 enable addonResizer with the above mention version in an EKS cluster.

the configrations is added below.

  - command:
    - /pod_nanny
    - --config-dir=/etc/config
    - --deployment=metrics-server
    - --container=metrics-server
    - --threshold=5
    - --poll-period=300000
    - --estimator=exponential
    - --minClusterSize=80
    - --use-metrics=true
    env:
    - name: GOMAXPROCS
      value: "1"
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: MY_POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    image: some_registry/registry.k8s.io/autoscaling/addon-resizer:1.8.21
    imagePullPolicy: IfNotPresent
    name: metrics-server-nanny
    resources:
      limits:
        cpu: 100m
        memory: 70Mi
      requests:
        cpu: 100m
        memory: 70Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000
      seccompProfile:
        type: RuntimeDefault

Anything else we need to know?:

We are running the metrics server in our EKS cluster with addonResizer enabled to auto-scale resource requests/limits based on cluster size.
However, the nanny container (addon-resizer) is experiencing CPU throttling, even if it's configured with 1000m CPU—much higher than what the official Helm chart default suggests (~40m)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions