-
Notifications
You must be signed in to change notification settings - Fork 27
fix: reject scale down request if there is already an in-process request in the same rollout group #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reject scale down request if there is already an in-process request in the same rollout group #254
Conversation
…n the same rollout group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's test further before merging. Approving to unblock.
pkg/admission/zone_tracker_test.go
Outdated
// mimic in progress downscaling request of rollout group ingester | ||
zt.rolloutGroupDownscalingInProgress.Store(rolloutGroupIngester, ingesterZoneA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than reaching into the internals like this, what if we changed the test to something like:
- configure StatefulSet to call fake prepare downscale handler
- initiate downscale, rollout-operator calls fake handler and marks downscale as in progress
- fake handler does not return until signalled later
- initiate a second downscale: rollout-operator should reject the call immediately given the other is in progress
- release fake handler to allow first request to complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Charles, for the review! I have made the suggested change.
pkg/admission/zone_tracker_test.go
Outdated
ar = buildAdmissionRequest(rolloutGroupIngester, ingesterZoneB) | ||
admissionResponse = zt.prepareDownscale(context.Background(), logger, ar, api, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be racy - this request could arrive while the zone A request is still finishing. Should we wait for the zone A admission request to complete before sending the next request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry, I missed that. I have fixed it. Thank you!
…for the same rollout group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo suggestion below
Co-authored-by: Charles Korn <[email protected]>
If multiple simultaneous scale-down requests are made to prepare the downscale handler for the same rollout group, the code might inadvertently allow all the requests, even if
grafana.com/min-time-between-zones-downscale
is set to a non-zero value.The issue is that there is no synchronisation to prevent that from happening.
In this PR, I am fixing the issue by tracking ongoing prepare downscale requests by rollout group and rejecting any new downscale requests for the same rollout group until the previous request is finished serving.