Skip to content

[WIP] Seatunnel deltalake #9223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open

Conversation

HaoXuAI
Copy link
Contributor

@HaoXuAI HaoXuAI commented Apr 24, 2025

fix: #9027

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a new DeltaLake connector to SeaTunnel including source implementation, split enumerators, catalog integration, configuration, and data deserialization.

  • Introduces new classes for DeltaLake source reading and catalog management
  • Implements split enumeration, source reader, and configuration options for DeltaLake integration
  • Adds support for Kerberos authentication in the catalog loader

Reviewed Changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
DeltaLakeEnumerationResult.java New DTO class for encapsulating file scan task splits and enumeration positions
DeltaLakeBatchSplitEnumerator.java Implements batch split enumeration by loading splits from tables and assigning them to readers
AbstractSplitEnumerator.java Base implementation for split enumeration including pending tables and splits management
DeltaLakeSource.java Adds DeltaLake source implementation with configuration for bounded/unbounded mode
StaticPathResolver.java & MetastoreResolver.java Provide static path resolution for table metadata
DeltalakeConnectorException.java & DeltalakeConnectorErrorCode.java Define custom exception and error codes for DeltaLake connector errors
Deserializer.java, DefaultDeserializer.java, DeltalakeTypeMapper.java Introduce data deserialization and type mapping logic for DeltaLake formats
SourceTableConfig.java, DeltaLakeSourceOptions.java, DeltaLakeSourceConfig.java, DeltaLakeCommonOptions.java, DeltaLakeCommonConfig.java Configure options and encapsulate configuration details for DeltaLake integration
DeltaLakeCatalog.java & DeltaLakeCatalogLoader.java Implement catalog integration with support for Kerberos authentication and DeltaLake table loading
Files not reviewed (1)
  • seatunnel-connectors-v2/connector-deltalake/pom.xml: Language not supported
Comments suppressed due to low confidence (1)

seatunnel-connectors-v2/connector-deltalake/src/main/java/org/apache/seatunnel/connectors/seatunnel/deltalake/config/SourceTableConfig.java:69

  • The use of table.lastIndexOf("\.") is likely incorrect since lastIndexOf does not treat the argument as a regex; it should probably be lastIndexOf('.') to locate the dot character correctly.
String namespace = table.substring(0, table.lastIndexOf("\."));

// TODO: Waiting for old version migration to complete
// before remove
if (split.getTablePath() == null) {
new DeltaLakeFileScanTaskSplit(
Copy link
Preview

Copilot AI Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside the lambda, a new DeltaLakeFileScanTaskSplit is constructed when split.getTablePath() is null, but the new instance is never returned, resulting in a null value. Consider returning the newly created split instead of always returning null.

Suggested change
new DeltaLakeFileScanTaskSplit(
return new DeltaLakeFileScanTaskSplit(

Copilot uses AI. Check for mistakes.

Signed-off-by: HaoXuAI <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][connector-v2] Support Deltalake
1 participant