-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52770][CONNECT] Support TIME
type in connect proto
#51464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-52770][CONNECT] Support TIME
type in connect proto
#51464
Conversation
634b4ce
to
68a1563
Compare
@dengziming Could you review this PR since you are working on similar one: #51462 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth for this PR, I think we should add a test case for this change, and I don't think your test case makes sense, I duplicated it locally and it run smoothly even without this change, take a look at the image below.

@@ -127,6 +129,11 @@ message DataType { | |||
uint32 type_variation_reference = 1; | |||
} | |||
|
|||
message Time { | |||
int32 precision = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make precision
optional analogous to Decimal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @peter-toth .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the PR description looks wrong to me.
# How was this patch tested?
Manually:
scala> sql("SELECT TIME '12:13:14'").show()
+---------------+
|TIME '12:13:14'|
+---------------+
| 12:13:14|
+---------------+
As reported on the JIRA issue, show
is handled by differently and works already. collect
was the problem.
$ bin/spark-connect-shell --remote sc://localhost:15002
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 4.1.0-preview1
/_/
Type in expressions to have them evaluated.
Spark connect server version 4.1.0-preview1.
Spark session available as 'spark'.
scala> sql("SELECT TIME '12:13:14'").show()
+---------------+
|TIME '12:13:14'|
+---------------+
| 12:13:14|
+---------------+
scala> sql("SELECT TIME '12:13:14'").collect()
org.apache.spark.SparkException: org.apache.spark.sql.connect.common.InvalidPlanInput: [INTERNAL_ERROR] Does not support convert time(6) to connect proto types. SQLSTATE: XX000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test case like the above comment.
The difference is that we need to return the result schema which is represented by
|
@dongjoon-hyun, ahh indeed, I ran the wrong test. The proper type support needs a bit more changes.
@dengziming, that's expected. |
68a1563
to
66361ef
Compare
66361ef
to
b16793b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the PR, @peter-toth .
Sure. I will add some more tests and adjust the description as well, but I don't see yet what would be the best place for the |
@@ -69,6 +69,8 @@ message DataType { | |||
|
|||
// UnparsedDataType | |||
Unparsed unparsed = 24; | |||
|
|||
Time time = 28; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding. Do you know where is 25, 26, 27, @peter-toth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately these are not in order. A few lines up:
Variant variant = 25;
and down:
// Reserved for geometry and geography types
reserved 26, 27;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM from my side. Thank you, @peter-toth .
cc @MaxGekk , @grundprinzip , @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @MaxGekk @hvanhovell @grundprinzip FYI
LGTM
@@ -182,6 +182,8 @@ void initAccessor(ValueVector vector) { | |||
accessor = new TimestampAccessor(timeStampMicroTZVector); | |||
} else if (vector instanceof TimeStampMicroVector timeStampMicroVector) { | |||
accessor = new TimestampNTZAccessor(timeStampMicroVector); | |||
} else if (vector instanceof TimeNanoVector timeNanoVector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a stupid question, when is these method used? why should we change arrow related method when adding connect dataType?
What changes were proposed in this pull request?
This PR adds
TIME
type support to Spark Connect.Why are the changes needed?
TIME
type is new in 4.1 and Spark Connect should support it.Does this PR introduce any user-facing change?
Yes, it adds basic
TIME
type support.How was this patch tested?
TODO.
Was this patch authored or co-authored using generative AI tooling?
No.