-
Notifications
You must be signed in to change notification settings - Fork 2k
[Feature][transform-v2] Add a 'RegexParseTransform' plugin in Apache SeaTunnel to parse irregular logs into structured logs (#9308) #9335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…SeaTunnel to parse irregular logs into structured logs (apache#9308)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new plugin, RegexParseTransform, to parse irregular logs into structured logs.
- Implements the RegexParseTransform plugin class that applies regex-based parsing on log fields.
- Adds configuration options for the regex, the field to be parsed, and mappings for capture groups.
- Registers the plugin through its factory and integrates with the SeaTunnel table transform APIs.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
RegexParseTransformFactory.java | Registers the new RegexParseTransform with options configuration. |
RegexParseTransformConfig.java | Declares configuration options for regex parsing including the field, regex pattern, and group mapping. |
RegexParseTransform.java | Implements the transformation logic using regex matching and capture group extraction. |
} | ||
} | ||
if (fieldIndex == -1) { | ||
throw new RuntimeException("regex_parse_field not Contained"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider updating the error message to a more descriptive and grammatically clear message (e.g., "regex_parse_field not found in table schema") to improve clarity for users.
throw new RuntimeException("regex_parse_field not Contained"); | |
throw new RuntimeException("regex_parse_field not found in the table schema"); |
Copilot uses AI. Check for mistakes.
} | ||
Object[] extracted = | ||
groupMap.values().stream() | ||
.map(index -> matcher.group(NumberUtils.toInt(index))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using NumberUtils.toInt(index) without validating the numeric conversion may silently default invalid strings to 0. It is recommended to validate the group index value to ensure the configuration is correct.
.map(index -> matcher.group(NumberUtils.toInt(index))) | |
.map(index -> { | |
if (!StringUtils.isNumeric(index)) { | |
throw new RuntimeException("Invalid group index: " + index + ". Group index must be a numeric string."); | |
} | |
return matcher.group(Integer.parseInt(index)); | |
}) |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test case, docs and e2e.
Purpose of this pull request
Add a 'RegexParseTransform' plugin in Apache SeaTunnel to parse irregular logs into structured logs, close #9308
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide