Skip to content

OTEP: Extending attributes to support complex values #4485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Apr 22, 2025

Related to #4468, #4414

This OTEP suggests a path forward on how to support complex attributes. It proposes to:

  • allow complex attributes on all signals by default on the API and SDK level
  • discourage them on metrics, resources, instrumentation scope, and identifying attributes on entities. If complex attribute is provided, it's probably by mistake, but we let backend decide.
  • provide guidance in semconv to default to flat attributes whenever possible and develop conventions with an assumption that complex attributes are not indexed, not queryable and not efficient.
  • documents how limits can apply to complex attributes (leaf nodes are counted towards the attribute count limit and are truncated based on the value limit) and suggests other ways to protect from users misusing complex attributes.

Having experimental phase for complex attributes limits the impact of this change and gives backends a graceful period to update.

lmolkova and others added 4 commits April 20, 2025 12:33

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Complex attributes: first round of feedback and clean up
lmolkova and others added 2 commits April 22, 2025 07:58

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Clarify metrics and cleanup
trask and others added 5 commits April 22, 2025 13:15

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Simplify, move options to the future
@lmolkova lmolkova marked this pull request as ready for review April 22, 2025 22:45
@lmolkova lmolkova requested review from a team as code owners April 22, 2025 22:45
trask added 2 commits April 22, 2025 20:40
Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM. There are still some things that would be good to have clarified.

lmolkova and others added 2 commits April 23, 2025 16:09

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Trask Stalnaker <[email protected]>
toc
trask added 2 commits June 18, 2025 12:39
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left couple of small comments around Exemplars wording. looks good otherwise.

trask added 3 commits June 19, 2025 08:51

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
instrumentation scope, or metric attribute where equality may be
extensively used to identify tracer/meter/logger or the time series.

Equality of `KeyValueList`s MUST be unaffected by the ordering of their `KeyValue` pairs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd. I'm assuming KeyValueList represents a heterogeneous array as defined in the glossary. Why wouldn't order be used in computation of equality? What is the difference between the data model of a heterogeneous array and a map at that point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at the proto definitions, I likely am mistaken. The KeyValueList represents a map not an heterogeneous array.

Can we either add proto definitions to the glossary or only use definitions defined here in the OTEP to avoid this type of confusion?

Copy link
Contributor Author

@lmolkova lmolkova Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to

When AnyValue contains one or more lists of key-value pairs:

  • the equality of AnyValue instances MUST NOT be affected by the ordering of
    the key-value pairs within the list.
  • the equality behavior is unspecified if duplicate keys are present.

Still working on the AnyValue definition added it

to updating comments like [this one](https://github.com/open-telemetry/opentelemetry-proto/blob/be5d58470429d0255ffdd49491f0815a3a63d6ef/opentelemetry/proto/trace/v1/trace.proto#L209-L213)
and adding changelog record.

## Trade-offs and mitigations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What seems to be lacking is a considered alternative ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdost could you elaborate? what's the alternative?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is that the solution seems to be leading the problem here. I am sure that the intentions are good, but I'd argue that adding this first class support for these complex attributes risks increasing the burden of comprehensability as well as introducing inefficiencies unnecessarily to the 99% case.
We're now making attributes not as universal creating caveats to when you should use certain types of attributes.
My alternative would be don't add this, and if desired add extensions to improve their support without altering the standard interface.
The prescient thing here is that there is no mentioned alternative meaning to any reader that there really hasn't been any alternatives considered.

Copy link
Contributor Author

@lmolkova lmolkova Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdost the motivation is mentioned in the Why section.

The data we want to express is sometimes complex and flattening is not an option. Complex attributes are already in the spec, proto and are allowed on the logs API. Users already put text blobs serialized in all possible (sometimes invalid) ways.

If you can propose a reasonable alternative that allows to express complex data, I'd be happy to mention it here and consider changing the direction.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to your point Logs already carry and allow for this burden. Logs already have associated trace identifiers, so why then is it not sufficient to have users Log with these attributes? What is the advantage vs using logging?

lmolkova added 7 commits July 7, 2025 12:34
span links, and as descriptive entity attributes.

OTel API MAY support setting complex attributes on metrics, resources,
instrumentation scope, span events, and as identifying entity attributes.
Copy link
Contributor Author

@lmolkova lmolkova Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the last week Spec call, we still have some concerns on this line

(along with "Exposing multiple types of attribute sets is NOT RECOMMENDED, such as having "ExtendedAttributes" in addition to "Attributes".).

@dyladan please correct me if I'm wrong, but the summary is that:

  • we don't have a real use-case for complex attributes on entities (less important)
  • it feels wrong to allow it and immediately tell users not to use it (more important)

I wanted to discuss a couple of points on the spec call tomorrow:

While we don't have a use-case in semantic conventions or instrumentations, it does not mean that end users don't have it or won't have it.

There are at least a few hypothetical use-cases where complex and structured data naturally appears out there in the wild and can be useful on metrics/entities

Complex data can appear on any signal and is discouraged in the same way on each of them. We're adding it to spans not because it's awesome, but because we need a way to record complex data that appears out there in the wild.

Complex data, especially if it's small, does not create much overhead over the same properties recorded as flat ones, but the bonus is that data on telemetry could match something well known defined outside of the telemetry.

lmolkova added 2 commits July 8, 2025 12:39
…ents to implementations that allow using complex attrs on metrics, resources, entities, instr scope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet