chore(dataobj): add ability to buffer pending sections to disk #18780

rfratto · 2025-08-08T17:17:00Z

While a data object is being constructed, completed sections accumulate in memory until the overall object flushes. When building large objects, a flush can take quite a long time to happen, leading to sustained memory usage of the builder.

Since completed sections serve no purpose, accumulating them in memory is wasteful.

This PR adds an abstraction so that users can optionally provide a path on disk for storing these pending sections. When specified, the path on disk must exist. The path to use for pending sections is intended to be ephemeral storage. Existing files in that path are ignored between restarts of Loki.

Completed sections are placed on disk using their SHA-224 as the filename (with .data or .metadata extension depending on the region being stored). If for some reason the sections fails to write to disk (such as the disk being full), in-memory buffers are used as a fallback.

When flushing a builder, the returned object reads from these section files on demand, allowing callers to upload objects to object storage without needing to buffer the entire object on memory. The section files are deleted after closing the io.Closer returned from dataobj.Builder.Flush.

rfratto · 2025-08-08T17:27:48Z

pkg/dataobj/encoder.go

-// TODO(rfratto): the memory footprint of [Encoder] can very slowly grow in
-// memory as [bufpool] is filled with buffers with increasing capacity:
-// each encoding pass has a different number of elements, shuffling which
-// elements of the hierarchy get which pooled buffers.
-//
-// This means that elements that require more bytes will grow the capacity of
-// the buffer and put the buffer back into the pool. Even if further encoding
-// passes don't need that many bytes, the buffer is kept alive with its larger
-// footprint. Given enough time, all buffers in the pool will have a large
-// capacity.
-//
-// The bufpool package provides bucketed pools as a solution to, but this
-// requires knowing how many bytes are needed.
-//
-// Encoder can eventually be moved to the bucketed pools by calculating a
-// rolling maximum of encoding size used per element across usages of an
-// Encoder instance. This would then allow larger buffers to be eventually
-// reclaimed regardless of how often encoding is done.


no longer relevant with this change (even if using memoryScratchStore, since there's one buffer per section) 🎉

sectionScratchStore is an interface for representing how completed sections are buffered while the rest of the data object is still being created. scratchStore has two implementations: - memoryScratchStore, which accumulates completed sections in memory - diskScratchStore, which accumulates completed sections on disk diskScratchStore is intended to be given an ephemeral storage location. Any existing files in the scratch filepath are ignored. To handle situations where a disk runs out of space, diskScratchStore will fallback to using in-memory storage if a write happens to fail.

A `snapshot` is the representation of a complete data object constructed by the encoder, but without fully loading that complete data object into memory at once. `snapshots` refer to a sectionScratchStore to read section data and metadata on demand.

Change the encoder to return a snapshot instead of writing data to a provided buffer. This allows the returned object of Builder.Flush to internally read from the scratchSectionStore.

NewBuilder now accepts an argument for where to store scratch data. When this argument is specified, a diskScratchStore is used for accumulating completed sections while the overall object is still in progress. Since creating a diskScratchStore can fail when given an invalid path, NewBuilder now returns an error.

section_scratch_path can now be used to tell data object builders that completed sections pending an object flush should be stored on disk to reduce peak memory usage of the builder.

rfratto · 2025-08-08T22:50:39Z

I ran this for about three hours in a dev environment and things look stable. It cuts memory usage of consumers by about 50%. While I suspected the memory improvement would be more significant than that, it does have the bonus effect of making memory usage far more stable over time.

I'm going to add some metrics for monitoring the cache, but I think this is good for review and merging.

Add metrics for tracking: * The current number of sections in the scratch store * The total number of bytes stored in the scratch store * The time taken to do operations in the scratch store

rfratto requested a review from a team as a code owner August 8, 2025 17:17

pull-request-size bot added the size/XXL label Aug 8, 2025

rfratto force-pushed the dataobj-spill-encoded-sections-to-disk branch from bc07f2a to 600f868 Compare August 8, 2025 17:25

rfratto commented Aug 8, 2025

View reviewed changes

rfratto force-pushed the dataobj-spill-encoded-sections-to-disk branch from 600f868 to 732c859 Compare August 8, 2025 17:31

rfratto added 5 commits August 8, 2025 14:31

chore(dataobj): add "snapshot"

92cc92e

A `snapshot` is the representation of a complete data object constructed by the encoder, but without fully loading that complete data object into memory at once. `snapshots` refer to a sectionScratchStore to read section data and metadata on demand.

chore(dataobj): change encoder to return snapshot

5b61f09

Change the encoder to return a snapshot instead of writing data to a provided buffer. This allows the returned object of Builder.Flush to internally read from the scratchSectionStore.

chore(dataobj): expose config for accumulating pending sections on disk

287d6f2

section_scratch_path can now be used to tell data object builders that completed sections pending an object flush should be stored on disk to reduce peak memory usage of the builder.

rfratto force-pushed the dataobj-spill-encoded-sections-to-disk branch from 732c859 to 287d6f2 Compare August 8, 2025 18:31

chore(dataobj): add observability for sectionScratchStore

f512fab

Add metrics for tracking: * The current number of sections in the scratch store * The total number of bytes stored in the scratch store * The time taken to do operations in the scratch store

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(dataobj): add ability to buffer pending sections to disk #18780

chore(dataobj): add ability to buffer pending sections to disk #18780

Uh oh!

rfratto commented Aug 8, 2025 •

edited

Loading

Uh oh!

rfratto Aug 8, 2025 •

edited

Loading

Uh oh!

rfratto commented Aug 8, 2025

Uh oh!

Uh oh!

chore(dataobj): add ability to buffer pending sections to disk #18780

Are you sure you want to change the base?

chore(dataobj): add ability to buffer pending sections to disk #18780

Uh oh!

Conversation

rfratto commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfratto Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfratto commented Aug 8, 2025

Uh oh!

Uh oh!

rfratto commented Aug 8, 2025 •

edited

Loading

rfratto Aug 8, 2025 •

edited

Loading