Skip to content

[WIP][SPARK-52449][CONNECT] Make datatypes for Expression.Literal.Map/Array optional #51473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

heyihong
Copy link
Contributor

@heyihong heyihong commented Jul 14, 2025

What changes were proposed in this pull request?

  1. Protobuf Definition Changes: Modified expressions.proto to mark element_type in Array and key_type/value_type in Map as optional, with comments explaining they only need to be set when collections are empty since type inference from elements is now supported.

  2. Type Inference Implementation: Enhanced LiteralValueProtoConverter with:

    • Updated toCatalystArray method to return both the array and its inferred type
    • Updated toCatalystMap method to return both the map and its inferred type
    • Added getBasicType method to infer basic types from literal values
    • Added LiteralValueWithDataType case class to encapsulate values with inferred types
    • Modified getConverter method to support type inference when inferDataType flag is enabled
  3. Integration Updates: Updated LiteralExpressionProtoConverter to work with the new API that returns both value and type from the converter methods.

Why are the changes needed?

Currently, Spark Connect requires explicit type specification for array and map literals even when the types can be trivially inferred from the elements. This creates unnecessary complexity for clients and increases message size.

The changes enable:

  • Simplified Client Code: Clients no longer need to explicitly specify types when they can be inferred
  • Reduced Message Size: Eliminates redundant type information in protobuf messages

Does this PR introduce any user-facing change?

No. This is a backwards-compatible change that makes type specification optional. Existing code that explicitly specifies types will continue to work unchanged, while new code can optionally omit types when they can be inferred.

How was this patch tested?

build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.2.4

@heyihong heyihong changed the title [SPARK-52449][CONNECT] Make datatypes for Expression.Literal.Map/Expression.Literal.Array optional [WIP][SPARK-52449][CONNECT] Make datatypes for Expression.Literal.Map/Array optional Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant