-
Notifications
You must be signed in to change notification settings - Fork 376
manager/koordlet: add pod-level CPU burst strategies for fine-grained control #2557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2557 +/- ##
==========================================
- Coverage 66.37% 66.36% -0.01%
==========================================
Files 485 485
Lines 57115 57140 +25
==========================================
+ Hits 37908 37922 +14
- Misses 16458 16463 +5
- Partials 2749 2755 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bcd9e4e
to
0a9e7ef
Compare
… control Signed-off-by: KT <[email protected]>
Hi @ZiMengSheng @saintube 👋 Just a gentle ping on this PR The feature has been fully verified in our test clusters — including dynamic pod-level burst control without restarts, and full backward compatibility with existing configurations. Let me know if anything needs to be adjusted. Thanks a lot for your time and review. |
@ktalg Thanks for your contributions! |
Thanks for the thoughtful suggestions! I've carefully considered both of the proposed alternatives, and would love to share some thoughts:
Compared to those options, the current implementation — which extends Happy to continue the discussion — open to adjustments if there's a direction the community prefers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced. Any comments @zwzhang0107 @songtao98
Ⅰ. Describe what this PR does
Adds Pod-level grey control for the CPU Burst feature without breaking the existing
ConfigMap ➜ koord-manager ➜ NodeSLO ➜ koordlet
pipeline.CPUBurstCfg
now contains an optionalpodStrategies
field that lets users target Pods by namespace and/or label selector and optionally override per-Pod annotations.slo-controller-config
toNodeSLO
, it copies the fullpodStrategies
list together with the node/cluster strategies.cpuburst
plugin mergespodStrategies
at runtime and applies the selected strategy during the normal pod scan; no extra grey-control plugin or additional watch channel is introduced.Ⅱ. Does this pull request fix one issue?
close #2553
Ⅲ. Describe how to verify it
slo-controller-config
ConfigMap and add apodStrategies
entryeg.
Ensure koord-manager reconciles the change into each node’s
NodeSLO
.Launch a matching Pod and observe on the node:
cpu.cfs_burst_us
/cpu.max.burst
set as expected.genPodBurstConfig
picked the pod-specific strategy.Non-matching Pods and nodes continue to follow node / cluster strategy.
Ⅳ. Special notes for reviews
V. Checklist
make test