[SPARK-52492][SQL] Make InMemoryRelation.convertToColumnarIfPossible customizable #51189

zhztheplayer · 2025-06-16T17:46:04Z

https://issues.apache.org/jira/browse/SPARK-52492

What changes were proposed in this pull request?

This PR moves InMemoryRelation.convertToColumnarIfPossible to as a public API of CachedBatchSerializer.

Why are the changes needed?

TL;DR: So that plugins like Gluten could have the relevant logic customized for their own catch serializers.

Currently, InMemoryRelation.convertToColumnarIfPossible is highly coupled with vanilla Spark's columnar processing logic. It unwraps the input columnar plan by removing the topmost ColumnarToRowExec, the assumes that the outcome RDD after this process can be recognized by the user-customized cache serializer.

But sometimes this assertion is invalid. As in the Apache Gluten project, we may continue distiguishing plans that are all have supportsColumnar=true with different columnar batch types. So even the topmost ColumnarToRowExec is removed, we still don't know whether the columnar RDD unwrapped can be accepted by Gluten's cache serializer (assuming it only handles one certain type of columnar batch or something).

So in Gluten we had a rule to workaround the logic in InMemoryRelation.convertToColumnarIfPossible: https://github.com/apache/incubator-gluten/blob/c6461b4e0c7d3022a31fa832aeab588b1a3200e6/gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/MiscColumnarRules.scala#L192-L217. This is the best way we had thought about to get through the issue but it's still not elegant, especially the rule is even caller-sensitive as it needs to determine whether it's called in the caching planning process or not.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

With the added UTs.

dongjoon-hyun · 2025-06-16T18:20:13Z

sql/core/src/main/scala/org/apache/spark/sql/columnar/CachedBatchSerializer.scala

+   * @return The output plan. Could either be a columnar plan if the input plan is convertible, or
+   *         the input plan unchanged if no viable conversion can be done.
+   */
+  @Since("4.0.1")


According to the Apache Spark backporting policy, this should be 4.1.0 because only bug-fixes are allowed for branch-4.0.

Thanks. Addressed and also updated Target Version/s field in the JIRA ticket.

yaooqinn

LGTM

sql/core/src/main/scala/org/apache/spark/sql/columnar/CachedBatchSerializer.scala

…tchSerializer.scala

zhztheplayer · 2025-06-18T11:17:12Z

@yaooqinn Thanks for amending the annotation!

yaooqinn · 2025-06-19T02:01:12Z

Merged to master, thank you @zhztheplayer @dongjoon-hyun

github-actions bot added the SQL label Jun 16, 2025

dongjoon-hyun requested changes Jun 16, 2025

View reviewed changes

zhztheplayer requested a review from dongjoon-hyun June 17, 2025 01:09

zhztheplayer force-pushed the wip-cache-customize-unwrap branch from 32a0fc7 to 63971eb Compare June 17, 2025 04:14

zhztheplayer added 5 commits June 17, 2025 10:36

fixup

1d51390

fixup

6936b17

fixup

0d47b01

address comment: update @SInCE

55f2cb8

backward compatibility

a353773

zhztheplayer force-pushed the wip-cache-customize-unwrap branch from 63971eb to a353773 Compare June 17, 2025 08:36

yaooqinn approved these changes Jun 18, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/columnar/CachedBatchSerializer.scala Show resolved Hide resolved

Update sql/core/src/main/scala/org/apache/spark/sql/columnar/CachedBa…

73e98b7

…tchSerializer.scala

yaooqinn closed this in 44d9fce Jun 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52492][SQL] Make InMemoryRelation.convertToColumnarIfPossible customizable #51189

[SPARK-52492][SQL] Make InMemoryRelation.convertToColumnarIfPossible customizable #51189

zhztheplayer commented Jun 16, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Jun 16, 2025

Uh oh!

zhztheplayer Jun 16, 2025

Uh oh!

yaooqinn left a comment

Uh oh!

Uh oh!

zhztheplayer commented Jun 18, 2025 •

edited

Loading

Uh oh!

yaooqinn commented Jun 19, 2025

Uh oh!

Uh oh!

[SPARK-52492][SQL] Make InMemoryRelation.convertToColumnarIfPossible customizable #51189

[SPARK-52492][SQL] Make InMemoryRelation.convertToColumnarIfPossible customizable #51189

Conversation

zhztheplayer commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

yaooqinn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhztheplayer commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaooqinn commented Jun 19, 2025

Uh oh!

Uh oh!

zhztheplayer commented Jun 16, 2025 •

edited

Loading

zhztheplayer commented Jun 18, 2025 •

edited

Loading