Skip to content

Add node_pool to blockedEval metric #26215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

allisonlarson
Copy link
Member

@allisonlarson allisonlarson commented Jul 7, 2025

Description

Adds the node_pool to the blockedEval metrics that get emitted for
resource/cpu, along with the dc and node class.

Testing & Reproduction steps

The node_pool has been added to the automated tests, and seen when using manual tests and observing metrics emitted.

Links

Fixes #25933

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@@ -92,6 +94,9 @@ func generateResourceStats(eval *structs.Evaluation) *BlockedResourcesStats {
for class := range allocMetrics.ClassExhausted {
classes[class] = struct{}{}
}

nodepools[allocMetrics.NodePool] = struct{}{}
Copy link
Member

@tgross tgross Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a probably minor thing, but all the FailedTGAllocs for a given evaluation will belong to the same node pool, because an evaluation is for a specific job and a job can only exist in a single node pool.

Rather than repeating the node pool name on the AllocMetrics structs, maybe we should just stick the node pool field on the Evaluation itself? If we wrote the field whenever we created the eval, we could potentially use that in the future for some of the zany ideas we've knocked around like per-pool scheduler workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, yeah that sounds good to me! I wasn't sure that Evaluation would be the right place to add a field that was only going to be used in a metric label, but if there are possible future applications I'd be happy put it there.

@aimeeu aimeeu added the theme/docs Documentation issues and enhancements label Jul 8, 2025
@aimeeu
Copy link
Contributor

aimeeu commented Jul 9, 2025

@allisonlarson The website/content/docs/operations/metrics-reference.mdx is now website/content/docs/reference/metrics.mdx
Please let me know if you want me to resolve the conflict and push a commit.

Adds the node_pool to the blockedEval metrics that get emitted for
resource/cpu, along with the dc and node class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/docs Documentation issues and enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Support for node_pool Label in certain server metrics
4 participants