Skip to content

[WIP][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled #51079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Jun 3, 2025

What changes were proposed in this pull request?

Enable divide-by-zero for boolean floordiv with ANSI enabled

Why are the changes needed?

Ensure pandas on Spark works well with ANSI mode on.
Part of https://issues.apache.org/jira/browse/SPARK-52169.

Does this PR introduce any user-facing change?

Yes

FROM

>>> ps.Series([True, False]) // 0
org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
...

TO

>>> ps.Series([True, False]) // 0
0   NaN
1   NaN
dtype: float64

How was this patch tested?

Unit tests.

(dev3.10) spark (booldiv) % SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.10 --testnames "pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTests.test_floordiv"
...
Finished test(python3.10): pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTests.test_floordiv (5s)
Tests passed in 5 seconds

(dev3.10) spark (booldiv) % SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.10 --testnames "pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTests.test_floordiv"
...
Finished test(python3.10): pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTests.test_floordiv (5s)
Tests passed in 5 seconds

Was this patch authored or co-authored using generative AI tooling?

No

@xinrong-meng
Copy link
Member Author

Failed test is irrelevant:

Run if [ -f ./dev/structured_logging_style.py ]; then
/__w/_temp/b006d885-b94f-4a19-80f6-62ff434a2696.sh: 2: python3.9: not found
Error: Process completed with exit code 127.

@xinrong-meng xinrong-meng requested a review from ueshin June 4, 2025 00:16
@@ -132,7 +128,7 @@ def test_floordiv(self):
self.assertRaises(TypeError, lambda: b_psser // b_psser)
self.assertRaises(TypeError, lambda: b_psser // True)

self.assert_eq(b_pser // pdf["float"], b_psser // psdf["float"])
self.assert_eq((b_pser // pdf["float"]).astype("int"), b_psser // psdf["float"])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.floordiv.html, as well as, integers should always be returned.

@xinrong-meng
Copy link
Member Author

@ueshin would you please review?

if isinstance(right, numbers.Number):
left = transform_boolean_operand_to_numeric(left, spark_type=as_spark_type(type(right)))
return left // right
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a numeric floor. I'm wondering if the numeric devision is ok or not.
I guess we should visit it first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> ps.Series([0,1,2]) // 0
...
pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
== DataFrame ==
"__div__" was called from
<stdin>:1

I guess fixing this will fix the boolean case as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Filed #51209

@xinrong-meng xinrong-meng changed the title [SPARK-52356][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled [SPARK-52519][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled Jun 17, 2025
@xinrong-meng xinrong-meng changed the title [SPARK-52519][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled [SPARK-52356][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled Jun 17, 2025
@xinrong-meng xinrong-meng changed the title [SPARK-52356][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled [WIP][PS] Enable divide-by-zero for boolean floordiv with ANSI enabled Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants