[WIP][SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode #51192

xinrong-meng · 2025-06-16T22:29:30Z

What changes were proposed in this pull request?

Fix autocorr divide-by-zero error under ANSI mode

Why are the changes needed?

Ensure pandas on Spark works well with ANSI mode on.
Part of https://issues.apache.org/jira/browse/SPARK-52169.

Does this PR introduce any user-facing change?

How was this patch tested?

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> ps.set_option("compute.fail_on_ansi_mode", False)
>>> ps.set_option("compute.ansi_mode_support", True)
>>> 
>>> s = ps.Series([.2, .0, .6, .2, np.nan, .5, .6])
>>> s.autocorr()
-0.14231876063832774
>>> s.autocorr(0)
1.0
>>> s.autocorr(2)
0.09234860641727351
>>> s.autocorr(-3)
0.1701242227446561
>>> s.autocorr(5)
-0.14085904245475267
>>> s.autocorr(6)
nan
>>> quit()

Was this patch authored or co-authored using generative AI tooling?

zhengruifeng · 2025-06-18T01:22:55Z

python/pyspark/pandas/series.py

        else:
            lag_scol = F.lag(scol, lag).over(Window.orderBy(NATURAL_ORDER_COLUMN_NAME))
            lag_col_name = verify_temp_column_name(sdf, "__autocorr_lag_tmp_col__")
-            corr = (
-                sdf.withColumn(lag_col_name, lag_scol)
-                .select(F.corr(scol, F.col(lag_col_name)))


how does corr affected by ansi?

use try_divide

a1aaa5e

github-actions bot added PYTHON PANDAS API ON SPARK labels Jun 16, 2025

zhengruifeng reviewed Jun 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode #51192

[WIP][SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode #51192

Uh oh!

xinrong-meng commented Jun 16, 2025 •

edited

Loading

Uh oh!

zhengruifeng Jun 18, 2025

Uh oh!

Uh oh!

[WIP][SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode #51192

Are you sure you want to change the base?

[WIP][SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode #51192

Uh oh!

Conversation

xinrong-meng commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xinrong-meng commented Jun 16, 2025 •

edited

Loading