Open
Description
Proposal
Hi,
we are working on scaling our nodes per node_pool
based on blocked job evaluations, in cases where there are insufficient resources.
Currently, if scaling is based solely on resource usage thresholds (e.g., memory usage, CPU), it may fail to trigger scaling for jobs with large resource requirements. These jobs remain unschedulable even though the threshold metrics do not indicate a need to scale out.
The metrics we would need the node_pool label attached to would be:
nomad.nomad.blocked_evals.cpu
nomad.nomad.blocked_evals.memory
Without the node_pool
label we can't determine the correct node_pool where a scaling operation is needed. We noticed that the node_pool
attribute is not presently queried in the evaluation data, which may need a bit more implementation work to support this feature.
Metadata
Metadata
Assignees
Type
Projects
Status
Needs Roadmapping