-
Notifications
You must be signed in to change notification settings - Fork 2k
[WIP] Seatunnel deltalake #9223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds a new DeltaLake connector to SeaTunnel including source implementation, split enumerators, catalog integration, configuration, and data deserialization.
- Introduces new classes for DeltaLake source reading and catalog management
- Implements split enumeration, source reader, and configuration options for DeltaLake integration
- Adds support for Kerberos authentication in the catalog loader
Reviewed Changes
Copilot reviewed 31 out of 32 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
DeltaLakeEnumerationResult.java | New DTO class for encapsulating file scan task splits and enumeration positions |
DeltaLakeBatchSplitEnumerator.java | Implements batch split enumeration by loading splits from tables and assigning them to readers |
AbstractSplitEnumerator.java | Base implementation for split enumeration including pending tables and splits management |
DeltaLakeSource.java | Adds DeltaLake source implementation with configuration for bounded/unbounded mode |
StaticPathResolver.java & MetastoreResolver.java | Provide static path resolution for table metadata |
DeltalakeConnectorException.java & DeltalakeConnectorErrorCode.java | Define custom exception and error codes for DeltaLake connector errors |
Deserializer.java, DefaultDeserializer.java, DeltalakeTypeMapper.java | Introduce data deserialization and type mapping logic for DeltaLake formats |
SourceTableConfig.java, DeltaLakeSourceOptions.java, DeltaLakeSourceConfig.java, DeltaLakeCommonOptions.java, DeltaLakeCommonConfig.java | Configure options and encapsulate configuration details for DeltaLake integration |
DeltaLakeCatalog.java & DeltaLakeCatalogLoader.java | Implement catalog integration with support for Kerberos authentication and DeltaLake table loading |
Files not reviewed (1)
- seatunnel-connectors-v2/connector-deltalake/pom.xml: Language not supported
Comments suppressed due to low confidence (1)
seatunnel-connectors-v2/connector-deltalake/src/main/java/org/apache/seatunnel/connectors/seatunnel/deltalake/config/SourceTableConfig.java:69
- The use of table.lastIndexOf("\.") is likely incorrect since lastIndexOf does not treat the argument as a regex; it should probably be lastIndexOf('.') to locate the dot character correctly.
String namespace = table.substring(0, table.lastIndexOf("\."));
// TODO: Waiting for old version migration to complete | ||
// before remove | ||
if (split.getTablePath() == null) { | ||
new DeltaLakeFileScanTaskSplit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside the lambda, a new DeltaLakeFileScanTaskSplit is constructed when split.getTablePath() is null, but the new instance is never returned, resulting in a null value. Consider returning the newly created split instead of always returning null.
new DeltaLakeFileScanTaskSplit( | |
return new DeltaLakeFileScanTaskSplit( |
Copilot uses AI. Check for mistakes.
Signed-off-by: HaoXuAI <[email protected]>
fix: #9027
Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
release-note
.