Skip to content

chore(dataobj): add ability to buffer pending sections to disk #18780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rfratto
Copy link
Member

@rfratto rfratto commented Aug 8, 2025

While a data object is being constructed, completed sections accumulate in memory until the overall object flushes. When building large objects, a flush can take quite a long time to happen, leading to sustained memory usage of the builder.

Since completed sections serve no purpose, accumulating them in memory is wasteful.

This PR adds an abstraction so that users can optionally provide a path on disk for storing these pending sections. When specified, the path on disk must exist. The path to use for pending sections is intended to be ephemeral storage. Existing files in that path are ignored between restarts of Loki.

Completed sections are placed on disk using their SHA-224 as the filename (with .data or .metadata extension depending on the region being stored). If for some reason the sections fails to write to disk (such as the disk being full), in-memory buffers are used as a fallback.

When flushing a builder, the returned object reads from these section files on demand, allowing callers to upload objects to object storage without needing to buffer the entire object on memory. The section files are deleted after closing the io.Closer returned from dataobj.Builder.Flush.

@rfratto rfratto requested a review from a team as a code owner August 8, 2025 17:17
@rfratto rfratto force-pushed the dataobj-spill-encoded-sections-to-disk branch from bc07f2a to 600f868 Compare August 8, 2025 17:25
Comment on lines -32 to -49
// TODO(rfratto): the memory footprint of [Encoder] can very slowly grow in
// memory as [bufpool] is filled with buffers with increasing capacity:
// each encoding pass has a different number of elements, shuffling which
// elements of the hierarchy get which pooled buffers.
//
// This means that elements that require more bytes will grow the capacity of
// the buffer and put the buffer back into the pool. Even if further encoding
// passes don't need that many bytes, the buffer is kept alive with its larger
// footprint. Given enough time, all buffers in the pool will have a large
// capacity.
//
// The bufpool package provides bucketed pools as a solution to, but this
// requires knowing how many bytes are needed.
//
// Encoder can eventually be moved to the bucketed pools by calculating a
// rolling maximum of encoding size used per element across usages of an
// Encoder instance. This would then allow larger buffers to be eventually
// reclaimed regardless of how often encoding is done.
Copy link
Member Author

@rfratto rfratto Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer relevant with this change (even if using memoryScratchStore, since there's one buffer per section) 🎉

@rfratto rfratto force-pushed the dataobj-spill-encoded-sections-to-disk branch from 600f868 to 732c859 Compare August 8, 2025 17:31
rfratto added 5 commits August 8, 2025 14:31
sectionScratchStore is an interface for representing how completed
sections are buffered while the rest of the data object is still being
created.

scratchStore has two implementations:

- memoryScratchStore, which accumulates completed sections in memory
- diskScratchStore, which accumulates completed sections on disk

diskScratchStore is intended to be given an ephemeral storage location.
Any existing files in the scratch filepath are ignored.

To handle situations where a disk runs out of space, diskScratchStore
will fallback to using in-memory storage if a write happens to fail.
A `snapshot` is the representation of a complete data object constructed
by the encoder, but without fully loading that complete data object into
memory at once. `snapshots` refer to a sectionScratchStore to read
section data and metadata on demand.
Change the encoder to return a snapshot instead of writing data to a
provided buffer. This allows the returned object of Builder.Flush to
internally read from the scratchSectionStore.
NewBuilder now accepts an argument for where to store scratch data. When
this argument is specified, a diskScratchStore is used for accumulating
completed sections while the overall object is still in progress.

Since creating a diskScratchStore can fail when given an invalid path,
NewBuilder now returns an error.
section_scratch_path can now be used to tell data object builders that
completed sections pending an object flush should be stored on disk to
reduce peak memory usage of the builder.
@rfratto rfratto force-pushed the dataobj-spill-encoded-sections-to-disk branch from 732c859 to 287d6f2 Compare August 8, 2025 18:31
@rfratto
Copy link
Member Author

rfratto commented Aug 8, 2025

I ran this for about three hours in a dev environment and things look stable. It cuts memory usage of consumers by about 50%. While I suspected the memory improvement would be more significant than that, it does have the bonus effect of making memory usage far more stable over time.

I'm going to add some metrics for monitoring the cache, but I think this is good for review and merging.

Add metrics for tracking:

* The current number of sections in the scratch store
* The total number of bytes stored in the scratch store
* The time taken to do operations in the scratch store
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant