-
Notifications
You must be signed in to change notification settings - Fork 965
Remove thread.name
from metrics
#14061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remove thread.name
from metrics
#14061
Conversation
// Use an access-ordered LinkedHashMap so we get a bounded LRU cache | ||
private final Map<String, Consumer<RecordedEvent>> perThread = | ||
new LinkedHashMap<String, Consumer<RecordedEvent>>(16, 0.75F, true) { | ||
@Override | ||
protected boolean removeEldestEntry(Map.Entry<String, Consumer<RecordedEvent>> eldest) { | ||
// Bound this map to prevent memory leaks with fast-cycling thread frameworks | ||
return size() > 512; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this map needed now that the thread name isn't being on the metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly not for correctness reasons. I didn't see any explicit documentation around this map, but after reading through the code my impression was this was mostly a performance optimization to reduce allocations of Consumer<RecordedEvent>
instances.
Since this was previously using an unsynchronized hashmap, it does appear to me that the invocation of these consumers is all single-threaded (I haven't worked directly with JFR before, so maybe that's not true?); it smells to me like there's no contention or throughput reasons to have this cache other than to reduce allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that jives with your understanding, I could simply remove the cache. While all the little allocations of the PerThread*Handler
inner classes probably aren't a problem for the use-cases I'm coming from (high-scale ecommerce), I imagine there are definitely existing OTEL use-cases where it would be, especially on older and/or smaller JVMs.
With that in mind, I'd actually prefer to jump all the way to inlining the PerThread*Handler
-based logic directly into the AbstractThreadDispatchingHandler
-subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that in mind, I'd actually prefer to jump all the way to inlining the PerThread*Handler-based logic directly into the AbstractThreadDispatchingHandler-subclasses.
I've gone ahead and done this, should be easy to revert if needed.
|
||
// FIXME doesn't actually do any grouping, but should be safe for now | ||
// FIXME only handles substrings of contiguous digits -> a single `x`, but should be good | ||
// enough for now | ||
@Nullable | ||
public String groupedName(RecordedEvent ev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere now that AbstractThreadDispatchingHandler
was deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, addressed.
@jhayes2-chwy you need to sign the CLA in order to get the PR merged |
Indeed; I've been working with my company to determine if we have a Corporate CLA, so I'm still waiting on that. |
public Consumer<RecordedEvent> createPerThreadSummarizer(String threadName) { | ||
return new PerThreadLongLockHandler(histogram, threadName); | ||
public void accept(RecordedEvent recordedEvent) { | ||
if (recordedEvent.hasField(EVENT_THREAD)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check needed? I think that previously this check was always true because ThreadGrouper
would have returned null
if the thread was missing. I'd remove this check unless anybody sees a reason why to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it may have been needed in order to add an attribute for the monitor's className, but I'd suspect that would have the same high-cardinality concerns as threadName, presuming it was used on Metrics.
Though I can personally see where that could provide a lot of value without cardinality issues, e.g. as an attribute on a Span Event or Profile. But unless there's active plans to build out that logic super soon, IMHO it'd be a premature optimization to keep the check in, at least for this specific reason.
🔧 The result from spotlessApply was committed to the PR branch. |
Is there a place I can see when the next release is and what'll be in it? This fix is something we're banking on having relatively soon (within roughly a calendar month or so). If the next full stable release is quite a ways away, I think we'd be okay even with an RC or pre-release to get us through the next few months. |
this repo releases monthly: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/RELEASING.md#release-cadence there are snapshots published with every merge to |
Since these metrics are not enabled in the default configuration you could consider not enabling them as a workaround. |
As outlined in #13407 and #14047, the
thread.name
attribute can create very high cardinality in many cases, and also contributes to a memory leak in the collection mechanism of those metrics. This PR is aimed at fixing both issues.The affected metrics are:
jvm.memory.allocation
jvm.cpu.longlock
jvm.network.io
jvm.network.time