Skip to content

[SPARK-52461] [SQL] Collapse inner Cast from DecimalType to DecimalType #51169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mihailoale-db
Copy link
Contributor

What changes were proposed in this pull request?

When coercing an expression from DecimalType to DecimalType, remove the inner Cast if it is not user-specified one and if the outer DecimalType is wider than inner (precision and scale are greater) - or in other words, if it is redundant. Do that only in case SQLConf.COLLAPSE_INNER_CAST_TO_DECIMAL value is true.

Why are the changes needed?

To make single-pass and fixed-point implementations compatible.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests (regenerated golden files) + added ones.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 12, 2025
b.withNewChildren(Seq(newE1, newE2))
}

/**
* In case `SQLConf.COLLAPSE_INNER_CAST_TO_DECIMAL` is true and inner cast is redundant (if the
* inner one is not a `USER_SPECIFIED_CAST` has lower precision and scale than the outer one),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has -> and has

@mihailoale-db
Copy link
Contributor Author

Actually we should revisit this approach, it's correct for most of the cases but there is this interesting one.
these queries:

create table t(col1 decimal(20,2));
select round(sum(col1)) as col1 from t having col1>=11;

return following plan

col1: decimal(29,0)
Filter (cast(col1#12977 as decimal(31,2)) >= cast(cast(cast(11 as decimal(2,0)) as decimal(20,2)) as decimal(31,2)))
+- Aggregate [round(sum(col1#12996), 0) AS col1#12977]
   +- SubqueryAlias t
      +- Relation t[col1#12996] parquet

and for these ones:

create table t(col1 decimal(20,2));
select round(sum(col1)) = 1;

this is the analyzed plan:

(round(sum(col1), 0) = 1): boolean
Aggregate [(round(sum(col1#12084), 0) = cast(cast(1 as decimal(1,0)) as decimal(29,0))) AS (round(sum(col1), 0) = 1)#12086]
+- SubqueryAlias t
   +- Relation t[col1#12084] parquet

In single-pass, for HAVING case, Cast is also decimal(29,0) as this is the actual type of round(sum(col1)) but as you can see in fixed-point its decimal(31,2) (which we get because of tempresolvedcolumn logic - it takes the precission of the expression used for resolution - sum(col1) which datatype is decimal(31,2) and thus both sides are casted to that wider type).

*/
private def tryCollapseInnerCast(expression: Expression, decimalType: DecimalType): Cast = {
expression match {
case innerCast @ Cast(_, innerDecimalType: DecimalType, _, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner cast is not redundant if it can cause truncation or overflow. Strictly speaking, we can only remove the inner cast if the final query result remains the same.

I think we should also check Cast.canNullSafeCastToDecimal(child.dataType, innerDecimalType)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants