-
Notifications
You must be signed in to change notification settings - Fork 253
[chore] Non-normative guidance for status metrics #2472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR adds non-normative guidance for designing status metrics. This is a common metric pattern that has confused contributors in the past, so this document exists to clear up some misconceptions and outline a unified design plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this together!
|
||
### Instrument | ||
|
||
The metric is instrumented as an `UpDownCounter` rather than a `Gauge`. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably link to the recommendation on how to record consistent UpDownCounters: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#consistent-updowncounter-timeseries
be in Kubernetes, where the word "phase" equally represents both and there | ||
is no acceptable alternative. In this case, using a metric name suffix is | ||
recommended to avoid the naming clash, i.e. naming the attribute `k8s.phase` and | ||
the metric `k8s.phase.current`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking, I see there's some metrics where the name + attributes are the same, thus already going against the recommendation here. For ex:
metric_name: k8s.container.status.state |
- id: metric.k8s.container.status.reason |
Could those be somehow adapted to not clash like it is proposed here? Would be unfortunate that we publish the guidance while we itself already do not follow it, unless we can give reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! We have several metrics of this type in K8s already:
semantic-conventions/model/k8s/metrics.yaml
Line 159 in 83b5544
- id: metric.k8s.node.condition.status semantic-conventions/model/k8s/metrics.yaml
Line 604 in 83b5544
- id: metric.k8s.namespace.phase metric_name: k8s.container.status.state - id: metric.k8s.container.status.reason
I wonder if we should not enforce the wording here but only the modeling instead. Maybe using .state
can just come as a generic suggestion rather than "stricter" guidance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can loosen the guidance here. But isn't it pretty problematic to have an attribute that is the exact same name as a metric? They are un-clashed in tooling by their Weaver identifiers, but couldn't this cause problems or at least confusion for people interpreting the conventions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen this confusing people so far (in PRs' reviews). But if people in general think it's an issue I'd fine changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused when reviewing some of the new k8s metrics that include a required attribute that exactly matches the metric name. I prefer the guidance suggested in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll propose it also makes translating to/from OTLP just a tiny bit harder.
When converting from the highly-structured OTLP format to other, non-otlp formats, being able to distinguish between a metric's name and the metric's datapoint's attributes is helpful. In OTLP we have the structure to distinguish between a metric's name and a datapoint attribute name, but in other formats we do have that structure.
Specifically, having the values match makes it hard to use the metric's name as a key and the datapoint attribute name as a key in a flat structure.
OTLP of course allows metric names and datapoint attributes to match, and we would never change that, but in our semantic conventions I like requiring metric name and datapoint attribute name to be different. I think it provides the most flexibility for to/from translations without putting a big restriction on ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened an issue to get clarification around this in general here: #2476
aliases: [status-metrics] | ||
---> | ||
|
||
# State Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt about moving it to https://github.com/open-telemetry/semantic-conventions/tree/main/docs/non-normative/how-to-write-conventions ?
I believe @jsuereth wanted to consolidate all the guidance there and maybe we'll move it altogether to a different place (some of the meta-guidance is or could be normative)
aliases: [status-metrics] | ||
---> | ||
|
||
# State Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a link to it from https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/how-to-define-semantic-conventions.md#defining-metrics ?
- [Design](#design) | ||
- [Naming](#naming) | ||
- [Instrument](#instrument) | ||
- [Why not Resource Attributes?](#why-not-resource-attributes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Why not Resource Attributes?](#why-not-resource-attributes) | |
- [Why not Entity Attributes?](#why-not-resource-attributes) |
?
- ref: example.state | ||
``` | ||
|
||
### Naming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you summarize it to one sentence and add it (with link to this doc) to https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/naming.md#metrics ?
It is a common mistake to attach a "state" as a Resource Attribute. This is | ||
not recommended for two reasons: | ||
|
||
* Resource is intended to be immutable, thus adding an attribute like "state" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it still apply considering entities and descriptive attributes being mutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was also thinking the same when I reviewed but wasn't sure. 🤔
Fixes #1554
Changes
This PR adds non-normative guidance for designing status metrics. This is a common metric pattern that has confused contributors in the past, so this document exists to clear up some misconceptions and outline a unified design plan.
Merge requirement checklist
[chore]