[WIP][SPARK-52449][CONNECT] Make datatypes for Expression.Literal.Map/Array optional #51473
+384
−143
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Protobuf Definition Changes: Modified
expressions.proto
to markelement_type
inArray
andkey_type
/value_type
inMap
as optional, with comments explaining they only need to be set when collections are empty since type inference from elements is now supported.Type Inference Implementation: Enhanced
LiteralValueProtoConverter
with:toCatalystArray
method to return both the array and its inferred typetoCatalystMap
method to return both the map and its inferred typegetBasicType
method to infer basic types from literal valuesLiteralValueWithDataType
case class to encapsulate values with inferred typesgetConverter
method to support type inference wheninferDataType
flag is enabledIntegration Updates: Updated
LiteralExpressionProtoConverter
to work with the new API that returns both value and type from the converter methods.Why are the changes needed?
Currently, Spark Connect requires explicit type specification for array and map literals even when the types can be trivially inferred from the elements. This creates unnecessary complexity for clients and increases message size.
The changes enable:
Does this PR introduce any user-facing change?
No. This is a backwards-compatible change that makes type specification optional. Existing code that explicitly specifies types will continue to work unchanged, while new code can optionally omit types when they can be inferred.
How was this patch tested?
build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.2.4