-
Notifications
You must be signed in to change notification settings - Fork 253
[chore] Non-normative guidance for status metrics #2472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,95 @@ | ||||||||||||||
<!--- Hugo front matter used to generate the website version of this page: | ||||||||||||||
linkTitle: Status Metrics | ||||||||||||||
aliases: [status-metrics] | ||||||||||||||
---> | ||||||||||||||
|
||||||||||||||
# State Metrics | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you please add a link to it from https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/how-to-define-semantic-conventions.md#defining-metrics ? |
||||||||||||||
|
||||||||||||||
**Status**: [Development][DocumentStatus] | ||||||||||||||
|
||||||||||||||
<!-- toc --> | ||||||||||||||
|
||||||||||||||
- [Definition](#definition) | ||||||||||||||
- [Design](#design) | ||||||||||||||
- [Naming](#naming) | ||||||||||||||
- [Instrument](#instrument) | ||||||||||||||
- [Why not Resource Attributes?](#why-not-resource-attributes) | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
? |
||||||||||||||
|
||||||||||||||
<!-- tocstop --> | ||||||||||||||
|
||||||||||||||
## Definition | ||||||||||||||
|
||||||||||||||
For the purposes of this document, a "status metric" is a metric used to | ||||||||||||||
represent something being in a particular state at a given time from a closed | ||||||||||||||
set of distinct possible state values. | ||||||||||||||
|
||||||||||||||
## Design | ||||||||||||||
|
||||||||||||||
To represent a metric of this nature, the design will look like the following: | ||||||||||||||
|
||||||||||||||
```yaml | ||||||||||||||
id: metric.example.status | ||||||||||||||
type: metric | ||||||||||||||
metric_name: example.status | ||||||||||||||
stability: development | ||||||||||||||
brief: "The current state." | ||||||||||||||
note: | | ||||||||||||||
A timeseries is produced for every possible value of `example.state`. The | ||||||||||||||
value of this metric is 1 for a state if it is currently in said state, and | ||||||||||||||
is 0 for all other states. | ||||||||||||||
instrument: updowncounter | ||||||||||||||
unit: "1" | ||||||||||||||
attributes: | ||||||||||||||
- ref: example.state | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
### Naming | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you summarize it to one sentence and add it (with link to this doc) to https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/naming.md#metrics ? |
||||||||||||||
|
||||||||||||||
This pattern represents a unique naming challenge. You could reasonably call the | ||||||||||||||
metric and attribute both "status" or "state" as the two words are relatively | ||||||||||||||
interchangeable in common parlance. | ||||||||||||||
|
||||||||||||||
This problem necessitates choosing words to represent two things: | ||||||||||||||
|
||||||||||||||
* The noun representing one of the set of possible values (the attribute name) | ||||||||||||||
* The adjective describing which of the values is currently active (the metric | ||||||||||||||
name) | ||||||||||||||
|
||||||||||||||
The general recommendation is to use the word "state" for the attribute and | ||||||||||||||
"status" for the metric. This is derived from common turns of phrase for each | ||||||||||||||
word, respectively: | ||||||||||||||
"What **state** is X in?" | ||||||||||||||
"What is the **current status** of X?" | ||||||||||||||
|
||||||||||||||
The english-language semantics of this are heavily debatable, thus it is | ||||||||||||||
recommended for the sake of consistency that this pattern be adopted to avoid | ||||||||||||||
inconsistent naming across similarly designed metrics. | ||||||||||||||
|
||||||||||||||
The exception to this would be if a particular system has only one word that | ||||||||||||||
would be used for both the noun and the adjective in question. An example would | ||||||||||||||
be in Kubernetes, where the word "phase" equally represents both and there | ||||||||||||||
is no acceptable alternative. In this case, using a metric name suffix is | ||||||||||||||
recommended to avoid the naming clash, i.e. naming the attribute `k8s.phase` and | ||||||||||||||
the metric `k8s.phase.current`. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking, I see there's some metrics where the name + attributes are the same, thus already going against the recommendation here. For ex:
Could those be somehow adapted to not clash like it is proposed here? Would be unfortunate that we publish the guidance while we itself already do not follow it, unless we can give reasons. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree! We have several metrics of this type in K8s already:
I wonder if we should not enforce the wording here but only the modeling instead. Maybe using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can loosen the guidance here. But isn't it pretty problematic to have an attribute that is the exact same name as a metric? They are un-clashed in tooling by their Weaver identifiers, but couldn't this cause problems or at least confusion for people interpreting the conventions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't seen this confusing people so far (in PRs' reviews). But if people in general think it's an issue I'd fine changing it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was confused when reviewing some of the new k8s metrics that include a required attribute that exactly matches the metric name. I prefer the guidance suggested in the PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll propose it also makes translating to/from OTLP just a tiny bit harder. When converting from the highly-structured OTLP format to other, non-otlp formats, being able to distinguish between a metric's name and the metric's datapoint's attributes is helpful. In OTLP we have the structure to distinguish between a metric's name and a datapoint attribute name, but in other formats we do have that structure. Specifically, having the values match makes it hard to use the metric's name as a key and the datapoint attribute name as a key in a flat structure. OTLP of course allows metric names and datapoint attributes to match, and we would never change that, but in our semantic conventions I like requiring metric name and datapoint attribute name to be different. I think it provides the most flexibility for to/from translations without putting a big restriction on ourselves. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have opened an issue to get clarification around this in general here: #2476 |
||||||||||||||
|
||||||||||||||
### Instrument | ||||||||||||||
|
||||||||||||||
The metric is instrumented as an `UpDownCounter` rather than a `Gauge`. This is | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably link to the recommendation on how to record consistent UpDownCounters: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#consistent-updowncounter-timeseries |
||||||||||||||
a deliberate choice, as it is a reasonable use case to count objects that are | ||||||||||||||
in a particular state. Since the metric value is either `0` or `1` for a given | ||||||||||||||
state attribute value, this means you can do a simple sum aggregation to count | ||||||||||||||
instances of particular states. | ||||||||||||||
|
||||||||||||||
### Why not Resource Attributes? | ||||||||||||||
|
||||||||||||||
It is a common mistake to attach a "state" as a Resource Attribute. This is | ||||||||||||||
not recommended for two reasons: | ||||||||||||||
|
||||||||||||||
* Resource is intended to be immutable, thus adding an attribute like "state" | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it still apply considering entities and descriptive attributes being mutable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I was also thinking the same when I reviewed but wasn't sure. 🤔 |
||||||||||||||
that will almost certainly change during the resource's lifetime breaks the | ||||||||||||||
immutability and will create what are essentially new timeseries each time the | ||||||||||||||
attribute would update. | ||||||||||||||
* Making the "state" an attribute on Resource would make it impossible to track | ||||||||||||||
changes in state over time. | ||||||||||||||
|
||||||||||||||
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt about moving it to https://github.com/open-telemetry/semantic-conventions/tree/main/docs/non-normative/how-to-write-conventions ?
I believe @jsuereth wanted to consolidate all the guidance there and maybe we'll move it altogether to a different place (some of the meta-guidance is or could be normative)