-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
When using CDC synchronous database operations on a database containing a large number of tables (hundreds of thousands), the "catalog. listTables (database)" operation in "SyncDataActionBase" may take a long time to complete, causing the entire synchronization job to start blocking for a long time. This will significantly affect the duration of CDC synchronization task initiation.
Solution
Current Behavior
The current implementation calls catalog.listTables(database)
during initialization and maintains a createdTables
set to track table creation status. This approach:
- Blocks the entire sync process while listing all tables
- Consumes unnecessary memory to maintain the
createdTables
set - Performs redundant operations when tables are created lazily
Expected Behavior
The sync process should:
- Avoid blocking on
listTables
operation during initialization - Create tables lazily when needed without maintaining a global
createdTables
set - Improve overall performance for databases with large numbers of tables
Solution
Optimize the table creation logic by:
- Removing the upfront
listTables
call inSyncDatabaseActionBase
- Eliminating the
createdTables
set fromRichCdcMultiplexRecordEventParser
- Implementing lazy table creation in
CdcDynamicTableParsingProcessFunction
with existence checks
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request