[SPARK-52759][SDP][SQL] Throw exception if pipeline has no tables or persisted views #51445

JiaqiWang18 · 2025-07-10T18:31:29Z

What changes were proposed in this pull request?

When user runs a pipeline, throw a RUN_EMPTY_PIPELINE exception if the pipeline source directory does not contain any tables or persisted views.

Add checks in GraphRegistrationContext. toDataflowGraph to throw the exception if the pipeline does not have any tables or persisted views
Modify test cases that currently register only temporary view to register persisted views instead since pipelines that only include temporary views are invalid now.
Add additional test cases to ensure the exception is thrown correctly.

Why are the changes needed?

In Spark Declarative Pipelines, using the CLI tool users run a pipeline that is defined from the configured pipeline root directory. This directory contains information such as the pipeline spec, and the source code files (Python, SQL) that define the pipeline tables/flows/views.

It’s possible the user tries to run a pipeline defined from a pipeline directory whose source files don’t actually define any tables or views.

An exception should be thrown if the pipeline does not have any tables or views, to inform the user they should double check that they are running the pipeline in the correct directory. The previous behavior is that the pipeline just run to completion without emitting any info.

Does this PR introduce any user-facing change?

Yes, this is an additive non-breaking behavior change. However, SDP has not been released, so no user should be impacted by this change.

How was this patch tested?

Created additional test case to verify that the exception is indeed thrown.

Was this patch authored or co-authored using generative AI tooling?

No

…views

JiaqiWang18 · 2025-07-10T20:08:39Z

@AnishMahto

AnishMahto

It's close! Just small comments, and let's also undo any formatting changes to existing code so that:

The diff is smaller, making it easier for PR reviewers to understand what exactly is changing
We don't pollute the git blame

If you feel like we should reformat some of the code, let's open a separate PR to do that!

...nnect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala

...pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/GraphRegistrationContext.scala

AnishMahto · 2025-07-11T00:27:34Z

...pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/GraphRegistrationContext.scala

@@ -50,16 +50,21 @@ class GraphRegistrationContext(
  }

  def toDataflowGraph: DataflowGraph = {
+    // throw exception if pipeline does not have table or persisted view
+    if (tables.isEmpty && views.collect { case v: PersistedView =>


In theory it's possible for a user to define a standalone flow in their source code, but no table. Should we throw an exception in that case, or is a good exception already thrown elsewhere for that case?

Would be nice to write a test for this.

added tests here.
I think as long as there's no table defined, an exception will be thrown.

AnishMahto · 2025-07-11T00:42:53Z

...lines/src/test/scala/org/apache/spark/sql/pipelines/utils/TestGraphRegistrationContext.scala

@@ -18,7 +18,7 @@
 package org.apache.spark.sql.pipelines.utils

 import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.analysis.{LocalTempView, UnresolvedRelation, ViewType}
+import org.apache.spark.sql.catalyst.analysis.{LocalTempView, PersistedView => PersistedViewType, UnresolvedRelation, ViewType}


nit: Let's just import it as PersistedView. I generally would recommend to only use alias imports when you need to import the same named entity from multiple different packages. Otherwise it adds another layer of abstraction when reading code.

Or if we think the name is bad, we should just do a rename of this class.

Actually, this is because we are importing two PersistedView from two different packages.
The second import is here
I renamed it to PersistedViewType because I think this specific import point the types file.

Ah, didn't see that! Makes sense

sql/pipelines/src/test/scala/org/apache/spark/sql/pipelines/graph/SqlPipelineSuite.scala

...ipelines/src/test/scala/org/apache/spark/sql/pipelines/graph/ConnectValidPipelineSuite.scala

AnishMahto · 2025-07-11T00:53:10Z

common/utils/src/main/resources/error/error-conditions.json

@@ -4519,6 +4519,15 @@
    ],
    "sqlState" : "42S22"
  },
+  "NO_TABLES_IN_PIPELINE" : {


How about we rename this to NO_DATASET_IN_PIPELINE, as persisted views are technically not tables but we're allowing a pipeline of just persisted views.

common/utils/src/main/resources/error/error-conditions.json

...nnect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala

Co-authored-by: AnishMahto <[email protected]>

AnishMahto

LGTM! Tagging @sryza for a second pass

AnishMahto · 2025-07-12T04:30:41Z

...pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/GraphRegistrationContext.scala

@@ -50,6 +50,14 @@ class GraphRegistrationContext(
  }

  def toDataflowGraph: DataflowGraph = {
+    // throw exception if pipeline does not have table or persisted view


super small nit: Let's just omit the comment, the exception message should be clear enough that it's self documenting.

This comment is prone to becoming stale anyway, when we add sinks for example.

sryza

Just one comment on the error code – otherwise looks good! After that's addressed, I will approve and merge.

common/utils/src/main/resources/error/error-conditions.json

Co-authored-by: Sandy Ryza <[email protected]>

sryza

Nice

sryza · 2025-07-15T04:28:59Z

Merged to master

[SPARK-52759] Throw exception if pipeline has no tables or persisted …

2a2747e

…views

github-actions bot added SQL CONNECT labels Jul 10, 2025

JiaqiWang18 changed the title ~~[SPARK-52759] Throw exception if pipeline has no tables or persisted views~~ [SPARK-52759][SDP][SQL] Throw exception if pipeline has no tables or persisted views Jul 10, 2025

nit

4d16efc

AnishMahto reviewed Jul 11, 2025

View reviewed changes

JiaqiWang18 and others added 6 commits July 10, 2025 18:24

Update common/utils/src/main/resources/error/error-conditions.json

9cf82ca

Co-authored-by: AnishMahto <[email protected]>

Address changes

13d2749

add test for standalone flow pipelines

d44fb1a

nit

082fca8

nit

fc682c6

nit

40c323c

JiaqiWang18 requested a review from AnishMahto July 11, 2025 16:46

AnishMahto approved these changes Jul 12, 2025

View reviewed changes

sryza reviewed Jul 14, 2025

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

JiaqiWang18 and others added 2 commits July 14, 2025 08:34

Update common/utils/src/main/resources/error/error-conditions.json

c5e4cea

Co-authored-by: Sandy Ryza <[email protected]>

Rename error message

20c0c50

JiaqiWang18 requested a review from sryza July 14, 2025 17:22

Rename error to RUN_EMPTY_PIPELINE

ff840d3

sryza approved these changes Jul 14, 2025

View reviewed changes

sryza closed this in c4127d2 Jul 15, 2025

[SPARK-52759][SDP][SQL] Throw exception if pipeline has no tables or persisted views #51445

[SPARK-52759][SDP][SQL] Throw exception if pipeline has no tables or persisted views #51445

Conversation

JiaqiWang18 commented Jul 10, 2025 • edited by sryza Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

JiaqiWang18 commented Jul 10, 2025

Uh oh!

AnishMahto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnishMahto left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sryza left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sryza left a comment

Choose a reason for hiding this comment

Uh oh!

sryza commented Jul 15, 2025

Uh oh!

Uh oh!

JiaqiWang18 commented Jul 10, 2025 •

edited by sryza

Loading