Skip to content

Optimise GHA log ingestion memory consumption #342

@phlope

Description

@phlope

This issue acts as a point of reference to investigate and optimise memory consumption for our Github Actions log event handling processes

Following reconciliation of orphaned log lines in #278 we have observed the sending queue filling more quickly and excessive memory consumption resulting in pods being OOM killed

As an initial fix we have configured a larger sending queue in both dev and ops environments, which at the time of writing is set to a queue size of 50k with 50 consumers
This has resolved the queueing bottleneck and error sending queue is full for both envs however ops is consuming a large amount of memory during high traffic periods for GHA

Actions taken:

Investigation:

  • With the memory profiling data we can see there is high allocation to repeated construction of log entry data structures, which we can potentially improve with the use of pooling, batching and reduction of redundant copies
  • plog.LogRecordSlice.AppendEmpty and pcommon.Value.SetStr are flagged as the largest allocations.
  • Our profiling review engine flags that each incoming log entry results in the creation of new map, string, and slice structures resulting in allocations per log item.
  • See flame graph analysis for additional information

As a first step we aim to introduce pooling of log writing processes and review memory consumption

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions