Large table batch backfilling

(Discussions is probably a proper place to ask this, but well)

I'm evaluating pg_incremental as a tool to do continuous migration of a very large table with > 1 billion rows (300Gb in size) to another partitioned one. I can't take the table offline, and online partitioning will likely be too long and a hairy endeavor.

So here's what I'm considering:

1. Create a pg_incremental pipeline that will select data from the old table and insert to the new table in time, say 5 minute, range batches from the very beginning. (5 minute range is about 6k rows, range could be larger, but then the insert will be slower)
2. Wait until the new table has most of the data and is as close as possible to the old one
3. Start switching the backend to use the new partitioned table
4. Keep the pipeline syncing until the old table stops receiving new data
5. Stop the pipeline

Since I'm using UUID for PK, there's no sequence to worry about and the lag between steps 3 and 4 where the new table won't have some of the latest data is tolerable.

However, I'm not sure if pg_incremental can do batching as often as possible until we are done with backfilling and do it in the background. From what I gathered in the docs, it's seems like pg_incremental is not supposed to be used for running many many batches before going to a regular mode. (unless it's done in the same session)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large table batch backfilling #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large table batch backfilling #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions