Description
Is your feature request related to a problem? Please describe.
Related to some performance issue like:
#13720
Describe the solution you'd like
Self Protection of Java Agent
Background
Code of Java Agent is actually weaved into the user's code. So Java Agent's code will inevitably affect the user's business logic in some scenarios. In order to control and minimise the impact of the Java Agent on the user's business code, We need to go and collect the resources occupied by the Java Agent and limit the resources consumed.
Self-monitoring of resource consumption
CPU usage of instrumentation code
- Collect the CPU time consumed by the
instrumenter
start/end method:
private final ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
long t1 = threadMXBean.getCurrentThreadCpuTime();
// instrumenter start/end
long t2 = threadMXBean.getCurrentThreadCpuTime();
// t2 - t1 is the CPU time that is consumed
private final OperatingSystemMXBean osBean = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
- Collect the CPU time consumed by JVM
long ot1 = osBean.getProcessCpuTime();
long ot2 = osBean.getProcessCpuTime();
// ot2 - ot1 is the CPU time consumed by JVM
- Get the CPU usage for Java Agent by
(ot2 - ot1) / (t2 - t1)
Number of instrumentation code executions
Here is a simple count of the number of times the start and end methods of instrumenter are executed.
Memory
To estimate the memory occupied by the Java Agent, we approximate the memory occupied by the Java Agent to be the memory occupied by the Context (Context can be thought of as the root of the request) and its referenced objects plus the memory occupied by the various Caches.
Calculate the memory consumed by a Java Object
We use a BFS to calculate the memory consumed by Context
Update the memory size when the object is destroyed
Use a ReferenceQueue with a WeakReference to update the memory size when the object is destroyed(We do not use JDK cleaner or phantom because they have bug in JDK8 that may cause the GC timing, which may lead to a memory leak).
Self Protection Approach
With the data we collected above, we can do self protection by changing the behavior of instrumenter#shouldStart
-
Limit the maximum CPU usage percentage of the Java Agent
-
Limit the maximum Memory usage percentage of the Java Agent
-
Limit the maximum QPS of the Java Agent(We cam limit the rate of Span generation, either overall or at the entrance(PropagatingFromUpstreamInstrumenter))
In the event of a self-protection condition, data may be lost, but the user's business will not be greatly affected.
Describe alternatives you've considered
No response
Additional context
The issue is discussed with @trask in May 15, 2025 (UTC) - APAC