Skip to content

Large table batch backfilling #6

@rauanmayemir

Description

@rauanmayemir

(Discussions is probably a proper place to ask this, but well)

I'm evaluating pg_incremental as a tool to do continuous migration of a very large table with > 1 billion rows (300Gb in size) to another partitioned one. I can't take the table offline, and online partitioning will likely be too long and a hairy endeavor.

So here's what I'm considering:

  1. Create a pg_incremental pipeline that will select data from the old table and insert to the new table in time, say 5 minute, range batches from the very beginning. (5 minute range is about 6k rows, range could be larger, but then the insert will be slower)
  2. Wait until the new table has most of the data and is as close as possible to the old one
  3. Start switching the backend to use the new partitioned table
  4. Keep the pipeline syncing until the old table stops receiving new data
  5. Stop the pipeline

Since I'm using UUID for PK, there's no sequence to worry about and the lag between steps 3 and 4 where the new table won't have some of the latest data is tolerable.

However, I'm not sure if pg_incremental can do batching as often as possible until we are done with backfilling and do it in the background. From what I gathered in the docs, it's seems like pg_incremental is not supposed to be used for running many many batches before going to a regular mode. (unless it's done in the same session)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions