Skip to content

[chore] Non-normative guidance for status metrics #2472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented Jul 3, 2025

Fixes #1554

Changes

This PR adds non-normative guidance for designing status metrics. This is a common metric pattern that has confused contributors in the past, so this document exists to clear up some misconceptions and outline a unified design plan.

Merge requirement checklist

This PR adds non-normative guidance for designing status metrics. This
is a common metric pattern that has confused contributors in the past,
so this document exists to clear up some misconceptions and outline a
unified design plan.
Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this together!


### Instrument

The metric is instrumented as an `UpDownCounter` rather than a `Gauge`. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably link to the recommendation on how to record consistent UpDownCounters: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#consistent-updowncounter-timeseries

be in Kubernetes, where the word "phase" equally represents both and there
is no acceptable alternative. In this case, using a metric name suffix is
recommended to avoid the naming clash, i.e. naming the attribute `k8s.phase` and
the metric `k8s.phase.current`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking, I see there's some metrics where the name + attributes are the same, thus already going against the recommendation here. For ex:

metric_name: k8s.container.status.state
,
- id: metric.k8s.container.status.reason

Could those be somehow adapted to not clash like it is proposed here? Would be unfortunate that we publish the guidance while we itself already do not follow it, unless we can give reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! We have several metrics of this type in K8s already:

I wonder if we should not enforce the wording here but only the modeling instead. Maybe using .state can just come as a generic suggestion rather than "stricter" guidance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can loosen the guidance here. But isn't it pretty problematic to have an attribute that is the exact same name as a metric? They are un-clashed in tooling by their Weaver identifiers, but couldn't this cause problems or at least confusion for people interpreting the conventions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this confusing people so far (in PRs' reviews). But if people in general think it's an issue I'd fine changing it.

Copy link
Member

@TylerHelmuth TylerHelmuth Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused when reviewing some of the new k8s metrics that include a required attribute that exactly matches the metric name. I prefer the guidance suggested in the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll propose it also makes translating to/from OTLP just a tiny bit harder.

When converting from the highly-structured OTLP format to other, non-otlp formats, being able to distinguish between a metric's name and the metric's datapoint's attributes is helpful. In OTLP we have the structure to distinguish between a metric's name and a datapoint attribute name, but in other formats we do have that structure.

Specifically, having the values match makes it hard to use the metric's name as a key and the datapoint attribute name as a key in a flat structure.

OTLP of course allows metric names and datapoint attributes to match, and we would never change that, but in our semantic conventions I like requiring metric name and datapoint attribute name to be different. I think it provides the most flexibility for to/from translations without putting a big restriction on ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue to get clarification around this in general here: #2476

@joaopgrassi
Copy link
Member

This can also close this old issue #199. CC @riverar

aliases: [status-metrics]
--->

# State Metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about moving it to https://github.com/open-telemetry/semantic-conventions/tree/main/docs/non-normative/how-to-write-conventions ?

I believe @jsuereth wanted to consolidate all the guidance there and maybe we'll move it altogether to a different place (some of the meta-guidance is or could be normative)

aliases: [status-metrics]
--->

# State Metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- [Design](#design)
- [Naming](#naming)
- [Instrument](#instrument)
- [Why not Resource Attributes?](#why-not-resource-attributes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Why not Resource Attributes?](#why-not-resource-attributes)
- [Why not Entity Attributes?](#why-not-resource-attributes)

?

- ref: example.state
```

### Naming
Copy link
Contributor

@lmolkova lmolkova Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you summarize it to one sentence and add it (with link to this doc) to https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/naming.md#metrics ?

It is a common mistake to attach a "state" as a Resource Attribute. This is
not recommended for two reasons:

* Resource is intended to be immutable, thus adding an attribute like "state"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it still apply considering entities and descriptive attributes being mutable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was also thinking the same when I reviewed but wasn't sure. 🤔

@riverar
Copy link

riverar commented Jul 15, 2025

This can also close this old issue #199. CC @riverar

Can you elaborate on this? I read the document and didn't really come away with any clarity with regards to my very specific desktop app monitoring scenario. Maybe I missed something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Create guideline for modeling state and phase as metrics
6 participants