Skip to content

[Feature][transform-v2] Add a 'RegexParseTransform' plugin in Apache SeaTunnel to parse irregular logs into structured logs (#9308) #9335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

ShiDSheng
Copy link

@ShiDSheng ShiDSheng commented May 19, 2025

Purpose of this pull request

Add a 'RegexParseTransform' plugin in Apache SeaTunnel to parse irregular logs into structured logs, close #9308

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

…SeaTunnel to parse irregular logs into structured logs (apache#9308)
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new plugin, RegexParseTransform, to parse irregular logs into structured logs.

  • Implements the RegexParseTransform plugin class that applies regex-based parsing on log fields.
  • Adds configuration options for the regex, the field to be parsed, and mappings for capture groups.
  • Registers the plugin through its factory and integrates with the SeaTunnel table transform APIs.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
RegexParseTransformFactory.java Registers the new RegexParseTransform with options configuration.
RegexParseTransformConfig.java Declares configuration options for regex parsing including the field, regex pattern, and group mapping.
RegexParseTransform.java Implements the transformation logic using regex matching and capture group extraction.

}
}
if (fieldIndex == -1) {
throw new RuntimeException("regex_parse_field not Contained");
Copy link
Preview

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider updating the error message to a more descriptive and grammatically clear message (e.g., "regex_parse_field not found in table schema") to improve clarity for users.

Suggested change
throw new RuntimeException("regex_parse_field not Contained");
throw new RuntimeException("regex_parse_field not found in the table schema");

Copilot uses AI. Check for mistakes.

}
Object[] extracted =
groupMap.values().stream()
.map(index -> matcher.group(NumberUtils.toInt(index)))
Copy link
Preview

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using NumberUtils.toInt(index) without validating the numeric conversion may silently default invalid strings to 0. It is recommended to validate the group index value to ensure the configuration is correct.

Suggested change
.map(index -> matcher.group(NumberUtils.toInt(index)))
.map(index -> {
if (!StringUtils.isNumeric(index)) {
throw new RuntimeException("Invalid group index: " + index + ". Group index must be a numeric string.");
}
return matcher.group(Integer.parseInt(index));
})

Copilot uses AI. Check for mistakes.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case, docs and e2e.

@nielifeng
Copy link
Contributor

@ShiDSheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][transform-v2] Add a 'RegexParseTransform' plugin in Apache SeaTunnel to parse irregular logs into structured logs
3 participants