Skip to content

Commit b18b956

Browse files
committed
[SPARK-52212][PYTHON][INFRA] Upgrade linter image to python 3.11
### What changes were proposed in this pull request? Upgrade linter image to python 3.11 ### Why are the changes needed? Python 3.9 is reaching its EOL this year, we need to upgrade all master's workflow to newer versions. ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #50931 from zhengruifeng/infra_linter_311. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
1 parent 64982a4 commit b18b956

File tree

9 files changed

+73
-55
lines changed

9 files changed

+73
-55
lines changed

.github/workflows/build_and_test.yml

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -787,8 +787,6 @@ jobs:
787787
LC_ALL: C.UTF-8
788788
LANG: C.UTF-8
789789
NOLINT_ON_COMPILE: false
790-
PYSPARK_DRIVER_PYTHON: python3.9
791-
PYSPARK_PYTHON: python3.9
792790
GITHUB_PREV_SHA: ${{ github.event.before }}
793791
container:
794792
image: ${{ needs.precondition.outputs.image_lint_url_link }}
@@ -849,11 +847,18 @@ jobs:
849847
run: ./dev/mima
850848
- name: Scala linter
851849
run: ./dev/lint-scala
852-
- name: Scala structured logging check
850+
- name: Scala structured logging check for branch-3.5 and branch-4.0
851+
if: inputs.branch == 'branch-3.5' || inputs.branch == 'branch-4.0'
853852
run: |
854853
if [ -f ./dev/structured_logging_style.py ]; then
855854
python3.9 ./dev/structured_logging_style.py
856855
fi
856+
- name: Scala structured logging check
857+
if: inputs.branch != 'branch-3.5' && inputs.branch != 'branch-4.0'
858+
run: |
859+
if [ -f ./dev/structured_logging_style.py ]; then
860+
python3.11 ./dev/structured_logging_style.py
861+
fi
857862
- name: Java linter
858863
run: ./dev/lint-java
859864
- name: Spark connect jvm client mima check
@@ -865,10 +870,18 @@ jobs:
865870
# Should delete this section after SPARK 3.5 EOL.
866871
python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0'
867872
python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
868-
- name: List Python packages
873+
- name: List Python packages for branch-3.5 and branch-4.0
874+
if: inputs.branch == 'branch-3.5' || inputs.branch == 'branch-4.0'
869875
run: python3.9 -m pip list
870-
- name: Python linter
876+
- name: List Python packages
877+
if: inputs.branch != 'branch-3.5' && inputs.branch != 'branch-4.0'
878+
run: python3.11 -m pip list
879+
- name: Python linter for branch-3.5 and branch-4.0
880+
if: inputs.branch == 'branch-3.5' || inputs.branch == 'branch-4.0'
871881
run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
882+
- name: Python linter
883+
if: inputs.branch != 'branch-3.5' && inputs.branch != 'branch-4.0'
884+
run: PYTHON_EXECUTABLE=python3.11 ./dev/lint-python
872885
# Should delete this section after SPARK 3.5 EOL.
873886
- name: Install dependencies for Python code generation check for branch-3.5
874887
if: inputs.branch == 'branch-3.5'

dev/spark-test-image/lint/Dockerfile

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image for Linter"
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20250312
27+
ENV FULL_REFRESH_DATE=20250519
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -51,6 +51,7 @@ RUN apt-get update && apt-get install -y \
5151
npm \
5252
pkg-config \
5353
qpdf \
54+
tzdata \
5455
r-base \
5556
software-properties-common \
5657
wget \
@@ -65,13 +66,17 @@ RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', 'rmarkdown',
6566
# See more in SPARK-39735
6667
ENV R_LIBS_SITE="/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
6768

68-
# Install Python 3.9
69+
# Install Python 3.11
6970
RUN add-apt-repository ppa:deadsnakes/ppa
70-
RUN apt-get update && apt-get install -y python3.9 python3.9-distutils \
71+
RUN apt-get update && apt-get install -y \
72+
python3.11 \
73+
&& apt-get autoremove --purge -y \
74+
&& apt-get clean \
7175
&& rm -rf /var/lib/apt/lists/*
72-
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
7376

74-
RUN python3.9 -m pip install \
77+
78+
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
79+
RUN python3.11 -m pip install \
7580
'black==23.12.1' \
7681
'flake8==3.9.0' \
7782
'googleapis-common-protos-stubs==2.2.0' \
@@ -91,6 +96,6 @@ RUN python3.9 -m pip install \
9196
'pyarrow>=19.0.0' \
9297
'pytest-mypy-plugins==1.9.3' \
9398
'pytest==7.1.3' \
94-
&& python3.9 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu \
95-
&& python3.9 -m pip install torcheval \
96-
&& python3.9 -m pip cache purge
99+
&& python3.11 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu \
100+
&& python3.11 -m pip install torcheval \
101+
&& python3.11 -m pip cache purge

python/pyspark/ml/tests/typing/test_classification.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
2424
# Should support
2525
OneVsRest(classifier=LogisticRegression())
26-
OneVsRest(classifier=LogisticRegressionModel.load("/foo")) # E: Argument "classifier" to "OneVsRest" has incompatible type "LogisticRegressionModel"; expected "Optional[Classifier[Never]]" [arg-type]
27-
OneVsRest(classifier="foo") # E: Argument "classifier" to "OneVsRest" has incompatible type "str"; expected "Optional[Classifier[Never]]" [arg-type]
26+
OneVsRest(classifier=LogisticRegressionModel.load("/foo")) # E: Argument "classifier" to "OneVsRest" has incompatible type "LogisticRegressionModel"; expected "Classifier[Never] | None" [arg-type]
27+
OneVsRest(classifier="foo") # E: Argument "classifier" to "OneVsRest" has incompatible type "str"; expected "Classifier[Never] | None" [arg-type]
2828
2929
3030
- case: fitFMClassifier

python/pyspark/ml/tests/typing/test_feature.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@
4747
out: |
4848
main:14: error: No overload variant of "StringIndexer" matches argument types "str", "list[str]" [call-overload]
4949
main:14: note: Possible overload variants:
50-
main:14: note: def StringIndexer(self, *, inputCol: Optional[str] = ..., outputCol: Optional[str] = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
51-
main:14: note: def StringIndexer(self, *, inputCols: Optional[list[str]] = ..., outputCols: Optional[list[str]] = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
50+
main:14: note: def StringIndexer(self, *, inputCol: str | None = ..., outputCol: str | None = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
51+
main:14: note: def StringIndexer(self, *, inputCols: list[str] | None = ..., outputCols: list[str] | None = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
5252
main:15: error: No overload variant of "StringIndexer" matches argument types "list[str]", "str" [call-overload]
5353
main:15: note: Possible overload variants:
54-
main:15: note: def StringIndexer(self, *, inputCol: Optional[str] = ..., outputCol: Optional[str] = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
55-
main:15: note: def StringIndexer(self, *, inputCols: Optional[list[str]] = ..., outputCols: Optional[list[str]] = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
54+
main:15: note: def StringIndexer(self, *, inputCol: str | None = ..., outputCol: str | None = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer
55+
main:15: note: def StringIndexer(self, *, inputCols: list[str] | None = ..., outputCols: list[str] | None = ..., handleInvalid: str = ..., stringOrderType: str = ...) -> StringIndexer

python/pyspark/sql/connect/shell/progress.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
from IPython.utils.terminal import get_terminal_size
3131
except ImportError:
3232

33-
def get_terminal_size(defaultx: Any = None, defaulty: Any = None) -> Any:
33+
def get_terminal_size(defaultx: Any = None, defaulty: Any = None) -> Any: # type: ignore[misc]
3434
return (80, 25)
3535

3636

python/pyspark/sql/tests/typing/test_dataframe.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@
3737
out: |
3838
main:16: error: No overload variant of "sample" of "DataFrame" matches argument type "bool" [call-overload]
3939
main:16: note: Possible overload variants:
40-
main:16: note: def sample(self, fraction: float, seed: Optional[int] = ...) -> DataFrame
41-
main:16: note: def sample(self, withReplacement: Optional[bool], fraction: float, seed: Optional[int] = ...) -> DataFrame
40+
main:16: note: def sample(self, fraction: float, seed: int | None = ...) -> DataFrame
41+
main:16: note: def sample(self, withReplacement: bool | None, fraction: float, seed: int | None = ...) -> DataFrame
4242
4343
4444
- case: selectColumns
@@ -54,7 +54,7 @@
5454
df.select(["name", "age"])
5555
df.select([col("name"), col("age")])
5656
57-
df.select(["name", col("age")]) # E: Argument 1 to "select" of "DataFrame" has incompatible type "list[object]"; expected "Union[list[Column], list[str]]" [arg-type]
57+
df.select(["name", col("age")]) # E: Argument 1 to "select" of "DataFrame" has incompatible type "list[object]"; expected "list[Column] | list[str]" [arg-type]
5858
5959
6060
- case: groupBy
@@ -71,7 +71,7 @@
7171
df.groupby(["name", "age"])
7272
df.groupBy([col("name"), col("age")])
7373
df.groupby([col("name"), col("age")])
74-
df.groupBy(["name", col("age")]) # E: Argument 1 to "groupBy" of "DataFrame" has incompatible type "list[object]"; expected "Union[list[Column], list[str], list[int]]" [arg-type]
74+
df.groupBy(["name", col("age")]) # E: Argument 1 to "groupBy" of "DataFrame" has incompatible type "list[object]"; expected "list[Column] | list[str] | list[int]" [arg-type]
7575
7676
7777
- case: rollup
@@ -88,7 +88,7 @@
8888
df.rollup([col("name"), col("age")])
8989
9090
91-
df.rollup(["name", col("age")]) # E: Argument 1 to "rollup" of "DataFrame" has incompatible type "list[object]"; expected "Union[list[Column], list[str]]" [arg-type]
91+
df.rollup(["name", col("age")]) # E: Argument 1 to "rollup" of "DataFrame" has incompatible type "list[object]"; expected "list[Column] | list[str]" [arg-type]
9292
9393
9494
- case: cube
@@ -105,7 +105,7 @@
105105
df.cube([col("name"), col("age")])
106106
107107
108-
df.cube(["name", col("age")]) # E: Argument 1 to "cube" of "DataFrame" has incompatible type "list[object]"; expected "Union[list[Column], list[str]]" [arg-type]
108+
df.cube(["name", col("age")]) # E: Argument 1 to "cube" of "DataFrame" has incompatible type "list[object]"; expected "list[Column] | list[str]" [arg-type]
109109
110110
111111
- case: dropColumns
@@ -124,7 +124,7 @@
124124
out: |
125125
main:10: error: No overload variant of "drop" of "DataFrame" matches argument types "Column", "Column" [call-overload]
126126
main:10: note: Possible overload variants:
127-
main:10: note: def drop(self, cols: Union[Column, str]) -> DataFrame
127+
main:10: note: def drop(self, cols: Column | str) -> DataFrame
128128
main:10: note: def drop(self, *cols: str) -> DataFrame
129129
130130

python/pyspark/sql/tests/typing/test_functions.yml

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -69,33 +69,33 @@
6969
out: |
7070
main:29: error: No overload variant of "array" matches argument types "list[Column]", "list[Column]" [call-overload]
7171
main:29: note: Possible overload variants:
72-
main:29: note: def array(*cols: Union[Column, str]) -> Column
73-
main:29: note: def array(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
72+
main:29: note: def array(*cols: Column | str) -> Column
73+
main:29: note: def array(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
7474
main:30: error: No overload variant of "create_map" matches argument types "list[Column]", "list[Column]" [call-overload]
7575
main:30: note: Possible overload variants:
76-
main:30: note: def create_map(*cols: Union[Column, str]) -> Column
77-
main:30: note: def create_map(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
76+
main:30: note: def create_map(*cols: Column | str) -> Column
77+
main:30: note: def create_map(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
7878
main:31: error: No overload variant of "map_concat" matches argument types "list[Column]", "list[Column]" [call-overload]
7979
main:31: note: Possible overload variants:
80-
main:31: note: def map_concat(*cols: Union[Column, str]) -> Column
81-
main:31: note: def map_concat(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
80+
main:31: note: def map_concat(*cols: Column | str) -> Column
81+
main:31: note: def map_concat(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
8282
main:32: error: No overload variant of "struct" matches argument types "list[str]", "list[str]" [call-overload]
8383
main:32: note: Possible overload variants:
84-
main:32: note: def struct(*cols: Union[Column, str]) -> Column
85-
main:32: note: def struct(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
84+
main:32: note: def struct(*cols: Column | str) -> Column
85+
main:32: note: def struct(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
8686
main:33: error: No overload variant of "array" matches argument types "list[str]", "list[str]" [call-overload]
8787
main:33: note: Possible overload variants:
88-
main:33: note: def array(*cols: Union[Column, str]) -> Column
89-
main:33: note: def array(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
88+
main:33: note: def array(*cols: Column | str) -> Column
89+
main:33: note: def array(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
9090
main:34: error: No overload variant of "create_map" matches argument types "list[str]", "list[str]" [call-overload]
9191
main:34: note: Possible overload variants:
92-
main:34: note: def create_map(*cols: Union[Column, str]) -> Column
93-
main:34: note: def create_map(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
92+
main:34: note: def create_map(*cols: Column | str) -> Column
93+
main:34: note: def create_map(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
9494
main:35: error: No overload variant of "map_concat" matches argument types "list[str]", "list[str]" [call-overload]
9595
main:35: note: Possible overload variants:
96-
main:35: note: def map_concat(*cols: Union[Column, str]) -> Column
97-
main:35: note: def map_concat(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
96+
main:35: note: def map_concat(*cols: Column | str) -> Column
97+
main:35: note: def map_concat(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column
9898
main:36: error: No overload variant of "struct" matches argument types "list[str]", "list[str]" [call-overload]
9999
main:36: note: Possible overload variants:
100-
main:36: note: def struct(*cols: Union[Column, str]) -> Column
101-
main:36: note: def struct(Union[Sequence[Union[Column, str]], tuple[Union[Column, str], ...]], /) -> Column
100+
main:36: note: def struct(*cols: Column | str) -> Column
101+
main:36: note: def struct(Sequence[Column | str] | tuple[Column | str, ...], /) -> Column

python/pyspark/sql/tests/typing/test_readwriter.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@
2626
2727
spark.read.load(foo=True)
2828
29-
spark.read.load(foo=["a"]) # E: Argument "foo" to "load" of "DataFrameReader" has incompatible type "list[str]"; expected "Union[bool, float, int, str, None]" [arg-type]
30-
spark.read.option("foo", (1, )) # E: Argument 2 to "option" of "DataFrameReader" has incompatible type "tuple[int]"; expected "Union[bool, float, int, str, None]" [arg-type]
29+
spark.read.load(foo=["a"]) # E: Argument "foo" to "load" of "DataFrameReader" has incompatible type "list[str]"; expected "bool | float | int | str | None" [arg-type]
30+
spark.read.option("foo", (1, )) # E: Argument 2 to "option" of "DataFrameReader" has incompatible type "tuple[int]"; expected "bool | float | int | str | None" [arg-type]
3131
3232
3333
- case: readStreamOptions

python/pyspark/sql/tests/typing/test_session.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -76,16 +76,16 @@
7676
main:14: error: Value of type variable "AtomicValue" of "createDataFrame" of "SparkSession" cannot be "tuple[str, int]" [type-var]
7777
main:18: error: No overload variant of "createDataFrame" of "SparkSession" matches argument types "list[tuple[str, int]]", "StructType", "float" [call-overload]
7878
main:18: note: Possible overload variants:
79-
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: Iterable[RowLike], schema: Union[list[str], tuple[str, ...]] = ..., samplingRatio: Optional[float] = ...) -> DataFrame
80-
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: RDD[RowLike], schema: Union[list[str], tuple[str, ...]] = ..., samplingRatio: Optional[float] = ...) -> DataFrame
81-
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: Iterable[RowLike], schema: Union[StructType, str], *, verifySchema: bool = ...) -> DataFrame
82-
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: RDD[RowLike], schema: Union[StructType, str], *, verifySchema: bool = ...) -> DataFrame
83-
main:18: note: def [AtomicValue in (datetime, date, Decimal, bool, str, int, float)] createDataFrame(self, data: RDD[AtomicValue], schema: Union[AtomicType, str], verifySchema: bool = ...) -> DataFrame
84-
main:18: note: def [AtomicValue in (datetime, date, Decimal, bool, str, int, float)] createDataFrame(self, data: Iterable[AtomicValue], schema: Union[AtomicType, str], verifySchema: bool = ...) -> DataFrame
85-
main:18: note: def createDataFrame(self, data: DataFrame, samplingRatio: Optional[float] = ...) -> DataFrame
86-
main:18: note: def createDataFrame(self, data: Any, samplingRatio: Optional[float] = ...) -> DataFrame
87-
main:18: note: def createDataFrame(self, data: DataFrame, schema: Union[StructType, str], verifySchema: bool = ...) -> DataFrame
88-
main:18: note: def createDataFrame(self, data: Any, schema: Union[StructType, str], verifySchema: bool = ...) -> DataFrame
79+
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: Iterable[RowLike], schema: list[str] | tuple[str, ...] = ..., samplingRatio: float | None = ...) -> DataFrame
80+
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: RDD[RowLike], schema: list[str] | tuple[str, ...] = ..., samplingRatio: float | None = ...) -> DataFrame
81+
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: Iterable[RowLike], schema: StructType | str, *, verifySchema: bool = ...) -> DataFrame
82+
main:18: note: def [RowLike in (list[Any], tuple[Any, ...], Row)] createDataFrame(self, data: RDD[RowLike], schema: StructType | str, *, verifySchema: bool = ...) -> DataFrame
83+
main:18: note: def [AtomicValue in (datetime, date, Decimal, bool, str, int, float)] createDataFrame(self, data: RDD[AtomicValue], schema: AtomicType | str, verifySchema: bool = ...) -> DataFrame
84+
main:18: note: def [AtomicValue in (datetime, date, Decimal, bool, str, int, float)] createDataFrame(self, data: Iterable[AtomicValue], schema: AtomicType | str, verifySchema: bool = ...) -> DataFrame
85+
main:18: note: def createDataFrame(self, data: DataFrame, samplingRatio: float | None = ...) -> DataFrame
86+
main:18: note: def createDataFrame(self, data: Any, samplingRatio: float | None = ...) -> DataFrame
87+
main:18: note: def createDataFrame(self, data: DataFrame, schema: StructType | str, verifySchema: bool = ...) -> DataFrame
88+
main:18: note: def createDataFrame(self, data: Any, schema: StructType | str, verifySchema: bool = ...) -> DataFrame
8989
9090
- case: createDataFrameFromEmptyRdd
9191
main: |

0 commit comments

Comments
 (0)