Optimise GHA log ingestion memory consumption

This issue acts as a point of reference to investigate and optimise memory consumption for our Github Actions log event handling processes

Following reconciliation of orphaned log lines in https://github.com/grafana/grafana-ci-otel-collector/issues/278 we have observed the sending queue filling more quickly and excessive memory consumption resulting in pods being OOM killed

As an initial fix we have configured a larger [sending queue](https://github.com/open-telemetry/opentelemetry-collector/blob/f68d71016bc09bd56a065e2b0fbd46434778fd90/exporter/exporterhelper/README.md?plain=1#L16-L32) in both dev and ops environments, which at the time of writing is set to a queue size of 50k with 50 consumers
This has resolved the queueing bottleneck and error [sending queue is full](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22xeu%22%3A%7B%22datasource%22%3A%22000000193%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bnamespace%3D%5C%22cicd-o11y%5C%22%2C+cluster%3D%5C%22ops-eu-south-0%5C%22%7D+%7C%3D+%60sending+queue+is+full%60%22%2C%22queryType%22%3A%22range%22%2C%22datasource%22%3A%7B%22type%22%3A%22loki%22%2C%22uid%22%3A%22000000193%22%7D%2C%22editorMode%22%3A%22builder%22%2C%22direction%22%3A%22backward%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-24h%22%2C%22to%22%3A%22now%22%7D%7D%7D&orgId=1) for both envs however ops [is consuming a large amount of memory ](https://ops.grafana-ops.net/d/gKxo0tnVk/right-sizing?orgId=1&var-datasource=000000134&var-cluster=ops-eu-south-0&var-namespace=cicd-o11y&var-container=$__all&from=2025-07-23T20:50:08.542Z&to=2025-07-25T07:55:47.085Z&timezone=utc)during high traffic periods for GHA

Actions taken:
* Memory has been increased initially for the cicd-o11y ops pods, with a [further increase staged ](https://github.com/grafana/deployment_tools/pull/308257) 
* Pprof enabled in ops to match dev
* Profiling data ingested to Pyroscope: [alloc profile](https://ops.grafana-ops.net/a/grafana-pyroscope-app/explore?var-dataSource=grafanacloud-profiles&var-profileMetricId=memory:alloc_objects:count:space:bytes&from=2025-07-25T09:23:29.143Z&to=2025-07-25T09:45:39.831Z&explorationType=flame-graph&var-spanSelector=&uel_pid=grafana-pyroscope-app&uel_epid=grafana%2Fexplore%2Ftoolbar%2Faction&searchText=&panelType=time-series&layout=grid&hideNoData=off&var-serviceName=cicd-o11y%2Fcicd-otel-collector&var-filters=&var-filtersBaseline=&var-filtersComparison=&var-groupBy=&maxNodes=16384&diffFrom=&diffTo=&diffFrom-2=&diffTo-2=)

Investigation:

* With the memory profiling data we can see there is high allocation to repeated construction of log entry data structures, which we can potentially improve with the use of pooling, batching and reduction of redundant copies
* [plog.LogRecordSlice.AppendEmpty](https://pkg.go.dev/go.opentelemetry.io/collector/pdata/plog#ResourceLogsSlice.AppendEmpty) and [pcommon.Value.SetStr](https://pkg.go.dev/go.opentelemetry.io/collector/pdata/pcommon#Value.SetStr) are flagged as the largest allocations. 
* Our profiling review engine flags that each incoming log entry results in the creation of new map, string, and slice structures resulting in allocations per log item.
* See [flame graph analysis](https://ops.grafana-ops.net/a/grafana-pyroscope-app/explore?var-dataSource=grafanacloud-profiles&var-profileMetricId=memory:alloc_objects:count:space:bytes&from=2025-07-25T09:23:29.143Z&to=2025-07-25T09:45:39.831Z&explorationType=flame-graph&var-spanSelector=&uel_pid=grafana-pyroscope-app&uel_epid=grafana%2Fexplore%2Ftoolbar%2Faction&searchText=&panelType=time-series&layout=grid&hideNoData=off&var-serviceName=cicd-o11y%2Fcicd-otel-collector&var-filters=&var-filtersBaseline=&var-filtersComparison=&var-groupBy=&maxNodes=16384&diffFrom=&diffTo=&diffFrom-2=&diffTo-2=) for additional information

As a first step we aim to introduce pooling of log writing processes and review memory consumption


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimise GHA log ingestion memory consumption #342

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimise GHA log ingestion memory consumption #342

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions