Skip to content

[chore] Non-normative guidance for status metrics #2472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/non-normative/status-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: Status Metrics
aliases: [status-metrics]
--->

# State Metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about moving it to https://github.com/open-telemetry/semantic-conventions/tree/main/docs/non-normative/how-to-write-conventions ?

I believe @jsuereth wanted to consolidate all the guidance there and maybe we'll move it altogether to a different place (some of the meta-guidance is or could be normative)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


**Status**: [Development][DocumentStatus]

<!-- toc -->

- [Definition](#definition)
- [Design](#design)
- [Naming](#naming)
- [Instrument](#instrument)
- [Why not Resource Attributes?](#why-not-resource-attributes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Why not Resource Attributes?](#why-not-resource-attributes)
- [Why not Entity Attributes?](#why-not-resource-attributes)

?


<!-- tocstop -->

## Definition

For the purposes of this document, a "status metric" is a metric used to
represent something being in a particular state at a given time from a closed
set of distinct possible state values.

## Design

To represent a metric of this nature, the design will look like the following:

```yaml
id: metric.example.status
type: metric
metric_name: example.status
stability: development
brief: "The current state."
note: |
A timeseries is produced for every possible value of `example.state`. The
value of this metric is 1 for a state if it is currently in said state, and
is 0 for all other states.
instrument: updowncounter
unit: "1"
attributes:
- ref: example.state
```

### Naming
Copy link
Contributor

@lmolkova lmolkova Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you summarize it to one sentence and add it (with link to this doc) to https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/naming.md#metrics ?


This pattern represents a unique naming challenge. You could reasonably call the
metric and attribute both "status" or "state" as the two words are relatively
interchangeable in common parlance.

This problem necessitates choosing words to represent two things:

* The noun representing one of the set of possible values (the attribute name)
* The adjective describing which of the values is currently active (the metric
name)

The general recommendation is to use the word "state" for the attribute and
"status" for the metric. This is derived from common turns of phrase for each
word, respectively:
"What **state** is X in?"
"What is the **current status** of X?"

The english-language semantics of this are heavily debatable, thus it is
recommended for the sake of consistency that this pattern be adopted to avoid
inconsistent naming across similarly designed metrics.

The exception to this would be if a particular system has only one word that
would be used for both the noun and the adjective in question. An example would
be in Kubernetes, where the word "phase" equally represents both and there
is no acceptable alternative. In this case, using a metric name suffix is
recommended to avoid the naming clash, i.e. naming the attribute `k8s.phase` and
the metric `k8s.phase.current`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking, I see there's some metrics where the name + attributes are the same, thus already going against the recommendation here. For ex:

metric_name: k8s.container.status.state
,
- id: metric.k8s.container.status.reason

Could those be somehow adapted to not clash like it is proposed here? Would be unfortunate that we publish the guidance while we itself already do not follow it, unless we can give reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! We have several metrics of this type in K8s already:

I wonder if we should not enforce the wording here but only the modeling instead. Maybe using .state can just come as a generic suggestion rather than "stricter" guidance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can loosen the guidance here. But isn't it pretty problematic to have an attribute that is the exact same name as a metric? They are un-clashed in tooling by their Weaver identifiers, but couldn't this cause problems or at least confusion for people interpreting the conventions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this confusing people so far (in PRs' reviews). But if people in general think it's an issue I'd fine changing it.

Copy link
Member

@TylerHelmuth TylerHelmuth Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused when reviewing some of the new k8s metrics that include a required attribute that exactly matches the metric name. I prefer the guidance suggested in the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll propose it also makes translating to/from OTLP just a tiny bit harder.

When converting from the highly-structured OTLP format to other, non-otlp formats, being able to distinguish between a metric's name and the metric's datapoint's attributes is helpful. In OTLP we have the structure to distinguish between a metric's name and a datapoint attribute name, but in other formats we do have that structure.

Specifically, having the values match makes it hard to use the metric's name as a key and the datapoint attribute name as a key in a flat structure.

OTLP of course allows metric names and datapoint attributes to match, and we would never change that, but in our semantic conventions I like requiring metric name and datapoint attribute name to be different. I think it provides the most flexibility for to/from translations without putting a big restriction on ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue to get clarification around this in general here: #2476


### Instrument

The metric is instrumented as an `UpDownCounter` rather than a `Gauge`. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably link to the recommendation on how to record consistent UpDownCounters: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#consistent-updowncounter-timeseries

a deliberate choice, as it is a reasonable use case to count objects that are
in a particular state. Since the metric value is either `0` or `1` for a given
state attribute value, this means you can do a simple sum aggregation to count
instances of particular states.

### Why not Resource Attributes?

It is a common mistake to attach a "state" as a Resource Attribute. This is
not recommended for two reasons:

* Resource is intended to be immutable, thus adding an attribute like "state"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it still apply considering entities and descriptive attributes being mutable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was also thinking the same when I reviewed but wasn't sure. 🤔

that will almost certainly change during the resource's lifetime breaks the
immutability and will create what are essentially new timeseries each time the
attribute would update.
* Making the "state" an attribute on Resource would make it impossible to track
changes in state over time.

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
Loading