-
Notifications
You must be signed in to change notification settings - Fork 7
Description
(Discussions is probably a proper place to ask this, but well)
I'm evaluating pg_incremental as a tool to do continuous migration of a very large table with > 1 billion rows (300Gb in size) to another partitioned one. I can't take the table offline, and online partitioning will likely be too long and a hairy endeavor.
So here's what I'm considering:
- Create a pg_incremental pipeline that will select data from the old table and insert to the new table in time, say 5 minute, range batches from the very beginning. (5 minute range is about 6k rows, range could be larger, but then the insert will be slower)
- Wait until the new table has most of the data and is as close as possible to the old one
- Start switching the backend to use the new partitioned table
- Keep the pipeline syncing until the old table stops receiving new data
- Stop the pipeline
Since I'm using UUID for PK, there's no sequence to worry about and the lag between steps 3 and 4 where the new table won't have some of the latest data is tolerable.
However, I'm not sure if pg_incremental can do batching as often as possible until we are done with backfilling and do it in the background. From what I gathered in the docs, it's seems like pg_incremental is not supposed to be used for running many many batches before going to a regular mode. (unless it's done in the same session)