Skip to content

scheduler: basic node reconciler safety properties for system jobs #26216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 9, 2025

Conversation

tgross
Copy link
Member

@tgross tgross commented Jul 7, 2025

Property test assertions for the core safety properties of the node reconciler, for system and sysbatch jobs.

Ref: https://hashicorp.atlassian.net/browse/NMD-814
Ref: #26167

@tgross tgross force-pushed the property-testing-node-reconciler branch from 748d687 to 0dc2683 Compare July 7, 2025 20:55
@tgross tgross force-pushed the property-testing-node-reconciler branch from 0dc2683 to b6159da Compare July 8, 2025 15:17
@tgross tgross force-pushed the property-testing-node-reconciler branch from b6159da to 6fcb615 Compare July 8, 2025 15:27
@tgross tgross force-pushed the property-testing-node-reconciler branch from 6fcb615 to 4739619 Compare July 8, 2025 18:27
@tgross tgross force-pushed the property-testing-node-reconciler branch from 4739619 to 4983c70 Compare July 8, 2025 18:46
@tgross tgross force-pushed the property-testing-node-reconciler branch from 4983c70 to 42902b5 Compare July 8, 2025 18:51
@tgross tgross force-pushed the property-testing-node-reconciler branch from 42902b5 to 93d5f4f Compare July 8, 2025 20:35
tgross added a commit that referenced this pull request Jul 9, 2025
While working on property testing in #26216, I discovered we had unreachable
code in the node reconciler. The `diffSystemAllocsForNode` function receives a
set of non-terminal allocations, but then has branches where it assumes the
allocations might be terminal. It's trivially provable that these allocs are
always live, as the system scheduler splits the set of known allocs into live
and terminal sets before passing them into the node reconciler.

Eliminate the unreachable code and improve the variable names to make the known
state of the allocs more clear in the reconciler code.

Ref: #26216
Comment on lines 120 to 128
// If we are a sysbatch job and terminal, ignore (or stop?) the alloc
if job.Type == structs.JobTypeSysBatch && exist.TerminalStatus() {
result.Ignore = append(result.Ignore, AllocTuple{
Name: name,
TaskGroup: tg,
Alloc: exist,
})
continue
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled this out to #26236

tgross added a commit that referenced this pull request Jul 9, 2025
While working on property testing in #26216, I discovered we had unreachable
code in the node reconciler. The `diffSystemAllocsForNode` function receives a
set of non-terminal allocations, but then has branches where it assumes the
allocations might be terminal. It's trivially provable that these allocs are
always live, as the system scheduler splits the set of known allocs into live
and terminal sets before passing them into the node reconciler.

Eliminate the unreachable code and improve the variable names to make the known
state of the allocs more clear in the reconciler code.

Ref: #26216
@tgross tgross force-pushed the property-testing-node-reconciler branch from 78da383 to 723aed1 Compare July 9, 2025 17:44
Property test assertions for the core safety properties of the node reconciler,
for system jobs.

Ref: https://hashicorp.atlassian.net/browse/NMD-814
Ref: #26167
@tgross tgross force-pushed the property-testing-node-reconciler branch from 723aed1 to 15f921c Compare July 9, 2025 17:46
@tgross tgross added theme/scheduling theme/testing Test related issues labels Jul 9, 2025
@tgross tgross added this to the 1.11.0 milestone Jul 9, 2025
@tgross tgross marked this pull request as ready for review July 9, 2025 18:06
@tgross tgross requested a review from a team as a code owner July 9, 2025 18:06
@tgross tgross requested a review from a team as a code owner July 9, 2025 18:06
@tgross tgross requested a review from pkazmierczak July 9, 2025 18:07
Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@tgross tgross merged commit 74f7a8f into main Jul 9, 2025
45 checks passed
@tgross tgross deleted the property-testing-node-reconciler branch July 9, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants