-
Notifications
You must be signed in to change notification settings - Fork 2k
scheduler: basic node reconciler safety properties for system jobs #26216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
748d687
to
0dc2683
Compare
0dc2683
to
b6159da
Compare
b6159da
to
6fcb615
Compare
6fcb615
to
4739619
Compare
4739619
to
4983c70
Compare
4983c70
to
42902b5
Compare
42902b5
to
93d5f4f
Compare
tgross
added a commit
that referenced
this pull request
Jul 9, 2025
While working on property testing in #26216, I discovered we had unreachable code in the node reconciler. The `diffSystemAllocsForNode` function receives a set of non-terminal allocations, but then has branches where it assumes the allocations might be terminal. It's trivially provable that these allocs are always live, as the system scheduler splits the set of known allocs into live and terminal sets before passing them into the node reconciler. Eliminate the unreachable code and improve the variable names to make the known state of the allocs more clear in the reconciler code. Ref: #26216
tgross
commented
Jul 9, 2025
Comment on lines
120
to
128
// If we are a sysbatch job and terminal, ignore (or stop?) the alloc | ||
if job.Type == structs.JobTypeSysBatch && exist.TerminalStatus() { | ||
result.Ignore = append(result.Ignore, AllocTuple{ | ||
Name: name, | ||
TaskGroup: tg, | ||
Alloc: exist, | ||
}) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled this out to #26236
tgross
added a commit
that referenced
this pull request
Jul 9, 2025
While working on property testing in #26216, I discovered we had unreachable code in the node reconciler. The `diffSystemAllocsForNode` function receives a set of non-terminal allocations, but then has branches where it assumes the allocations might be terminal. It's trivially provable that these allocs are always live, as the system scheduler splits the set of known allocs into live and terminal sets before passing them into the node reconciler. Eliminate the unreachable code and improve the variable names to make the known state of the allocs more clear in the reconciler code. Ref: #26216
78da383
to
723aed1
Compare
Property test assertions for the core safety properties of the node reconciler, for system jobs. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: #26167
723aed1
to
15f921c
Compare
pkazmierczak
approved these changes
Jul 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Property test assertions for the core safety properties of the node reconciler, for system and sysbatch jobs.
Ref: https://hashicorp.atlassian.net/browse/NMD-814
Ref: #26167