Skip to content

Commit 3185b9e

Browse files
mihailoale-dbcloud-fan
authored andcommitted
[SPARK-52488][SQL] Strip alias before wrapping outer references under HAVING
### What changes were proposed in this pull request? For the following query: ``` SELECT col1 AS alias FROM values(named_struct('a', 1)) GROUP BY col1 HAVING ( SELECT col1.a = 1 ); ``` this is the resulting analyzed plan: ``` Filter cast(scalar-subquery#8847 [alias#8846] as boolean) : +- Project [(outer(alias#8846).a = 1) AS (outer(col1).a AS a = 1)#8867] : +- OneRowRelation +- Aggregate [col1#8865], [col1#8865 AS alias#8846] +- LocalRelation [col1#8865] ``` As it can be seen, we have outer(col1).a AS a in the Alias name for col1.a = 1 which is redundant and should be removed. It doesn't affect the output schema so changing the Alias name here is safe. After the change, plan looks like: ``` Filter cast(scalar-subquery#x [alias#x] as boolean) : +- Project [(outer(alias#x).a = 1) AS (outer(col1).a = 1)#x] : +- OneRowRelation +- Aggregate [col1#x], [col1#x AS alias#x] +- LocalRelation [col1#x] ``` ### Why are the changes needed? To keep the compatibility between fixed-point and single-pass implementations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests added in this PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51186 from mihailoale-db/stripaliasbeforewrapouterreference. Authored-by: mihailoale-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 3b75442 commit 3185b9e

File tree

4 files changed

+146
-1
lines changed

4 files changed

+146
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,11 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
210210
case u @ UnresolvedHaving(_, agg: Aggregate) =>
211211
agg.resolveChildren(nameParts, conf.resolver)
212212
.orElse(u.resolveChildren(nameParts, conf.resolver))
213-
.map(wrapOuterReference)
213+
.map {
214+
case alias: Alias =>
215+
wrapOuterReference(alias.child)
216+
case other => wrapOuterReference(other)
217+
}
214218
case other =>
215219
other.resolveChildren(nameParts, conf.resolver).map(wrapOuterReference)
216220
}

sql/core/src/test/resources/sql-tests/analyzer-results/having.sql.out

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -426,3 +426,63 @@ Project [((sum(v) + 1) + min(v))#xL]
426426
+- Project [k#x, v#x]
427427
+- SubqueryAlias hav
428428
+- LocalRelation [k#x, v#x]
429+
430+
431+
-- !query
432+
SELECT col1 AS alias
433+
FROM values(1)
434+
GROUP BY col1
435+
HAVING (
436+
SELECT col1 = 1
437+
)
438+
-- !query analysis
439+
Filter cast(scalar-subquery#x [alias#x] as boolean)
440+
: +- Project [(outer(alias#x) = 1) AS (outer(col1) = 1)#x]
441+
: +- OneRowRelation
442+
+- Aggregate [col1#x], [col1#x AS alias#x]
443+
+- LocalRelation [col1#x]
444+
445+
446+
-- !query
447+
SELECT col1 AS alias
448+
FROM values(named_struct('a', 1))
449+
GROUP BY col1
450+
HAVING (
451+
SELECT col1.a = 1
452+
)
453+
-- !query analysis
454+
Filter cast(scalar-subquery#x [alias#x] as boolean)
455+
: +- Project [(outer(alias#x).a = 1) AS (outer(col1).a = 1)#x]
456+
: +- OneRowRelation
457+
+- Aggregate [col1#x], [col1#x AS alias#x]
458+
+- LocalRelation [col1#x]
459+
460+
461+
-- !query
462+
SELECT col1 AS alias
463+
FROM values(array(1))
464+
GROUP BY col1
465+
HAVING (
466+
SELECT col1[0] = 1
467+
)
468+
-- !query analysis
469+
Filter cast(scalar-subquery#x [alias#x] as boolean)
470+
: +- Project [(outer(alias#x)[0] = 1) AS (outer(col1)[0] = 1)#x]
471+
: +- OneRowRelation
472+
+- Aggregate [col1#x], [col1#x AS alias#x]
473+
+- LocalRelation [col1#x]
474+
475+
476+
-- !query
477+
SELECT col1 AS alias
478+
FROM values(map('a', 1))
479+
GROUP BY col1
480+
HAVING (
481+
SELECT col1[0] = 1
482+
)
483+
-- !query analysis
484+
Filter cast(scalar-subquery#x [alias#x] as boolean)
485+
: +- Project [(outer(alias#x)[cast(0 as string)] = 1) AS (outer(col1)[0] = 1)#x]
486+
: +- OneRowRelation
487+
+- Aggregate [col1#x], [col1#x AS alias#x]
488+
+- LocalRelation [col1#x]

sql/core/src/test/resources/sql-tests/inputs/having.sql

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,32 @@ SELECT 1 + SUM(v) FROM hav HAVING SUM(v) + 1;
6262
SELECT SUM(v) + 1 FROM hav HAVING 1 + SUM(v);
6363
SELECT MAX(v) + SUM(v) FROM hav HAVING SUM(v) + MAX(v);
6464
SELECT SUM(v) + 1 + MIN(v) FROM hav HAVING 1 + 1 + 1 + MIN(v) + 1 + SUM(v);
65+
66+
-- HAVING with outer reference to alias in outer project list
67+
SELECT col1 AS alias
68+
FROM values(1)
69+
GROUP BY col1
70+
HAVING (
71+
SELECT col1 = 1
72+
);
73+
74+
SELECT col1 AS alias
75+
FROM values(named_struct('a', 1))
76+
GROUP BY col1
77+
HAVING (
78+
SELECT col1.a = 1
79+
);
80+
81+
SELECT col1 AS alias
82+
FROM values(array(1))
83+
GROUP BY col1
84+
HAVING (
85+
SELECT col1[0] = 1
86+
);
87+
88+
SELECT col1 AS alias
89+
FROM values(map('a', 1))
90+
GROUP BY col1
91+
HAVING (
92+
SELECT col1[0] = 1
93+
);

sql/core/src/test/resources/sql-tests/results/having.sql.out

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,3 +291,55 @@ SELECT SUM(v) + 1 + MIN(v) FROM hav HAVING 1 + 1 + 1 + MIN(v) + 1 + SUM(v)
291291
struct<((sum(v) + 1) + min(v)):bigint>
292292
-- !query output
293293
13
294+
295+
296+
-- !query
297+
SELECT col1 AS alias
298+
FROM values(1)
299+
GROUP BY col1
300+
HAVING (
301+
SELECT col1 = 1
302+
)
303+
-- !query schema
304+
struct<alias:int>
305+
-- !query output
306+
1
307+
308+
309+
-- !query
310+
SELECT col1 AS alias
311+
FROM values(named_struct('a', 1))
312+
GROUP BY col1
313+
HAVING (
314+
SELECT col1.a = 1
315+
)
316+
-- !query schema
317+
struct<alias:struct<a:int>>
318+
-- !query output
319+
{"a":1}
320+
321+
322+
-- !query
323+
SELECT col1 AS alias
324+
FROM values(array(1))
325+
GROUP BY col1
326+
HAVING (
327+
SELECT col1[0] = 1
328+
)
329+
-- !query schema
330+
struct<alias:array<int>>
331+
-- !query output
332+
[1]
333+
334+
335+
-- !query
336+
SELECT col1 AS alias
337+
FROM values(map('a', 1))
338+
GROUP BY col1
339+
HAVING (
340+
SELECT col1[0] = 1
341+
)
342+
-- !query schema
343+
struct<alias:map<string,int>>
344+
-- !query output
345+

0 commit comments

Comments
 (0)