-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-51415][SQL] Make timestamp from date and time #51179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -2746,7 +2746,11 @@ object TryMakeTimestampLTZExpressionBuilder extends ExpressionBuilder { | |||
|
|||
// scalastyle:off line.size.limit | |||
@ExpressionDescription( | |||
usage = "_FUNC_(year, month, day, hour, min, sec[, timezone]) - Create timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration `spark.sql.timestampType`. If the configuration `spark.sql.ansi.enabled` is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.", | |||
usage = """ | |||
_FUNC_(year, month, day, hour, min, sec[, timezone]) - Create timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration `spark.sql.timestampType`. If the configuration `spark.sql.ansi.enabled` is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we move the output type and error out comments after the second declaration of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
@srielau @uros-db @mihailom-db Please, review the PR. |
What is the exact type? Since both DATE and TIME are WITHOUT TIMEZONE , presumably the result should be without timezone. |
-- !query | ||
SELECT make_timestamp(DATE'0001-01-01', TIME'0:0:0') | ||
-- !query analysis | ||
[Analyzer test output redacted due to nondeterminism] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's exactly non-deterministic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me too. I have leaved a question in the PR: https://github.com/apache/spark/pull/40496/files#r2154235184
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume that we might call current_timestamp or similar functions that could produce different literals. But if this is the case, I would say this is the bug in the new system where we do not filter on the specific expressions, but on literals in general. cc: @dtenedor on this PR, for context
@@ -169,3 +169,8 @@ select timediff(SECOND, date'2022-02-15', timestamp'2022-02-14 23:59:59'); | |||
|
|||
select timediff('MINUTE', timestamp'2023-02-14 01:02:03', timestamp'2023-02-14 02:00:03'); | |||
select timediff('YEAR', date'2020-02-15', date'2023-02-15'); | |||
|
|||
-- Construct timestamp from date and time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we handle null values? Can we add tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests. Thanks.
@srielau We do have make_timestamp_ntz(). This is the base function which takes timezone from the config or from the input. When it comes to date/time not having zone, I would say neither do hour, minute, second, day, year, month from which the original make_timestamp was done. Should we maybe just improve the docs around this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from few comments from @uros-db
It depends on the config as the function doc says: "The result data type is consistent with the value of configuration New behaviour of two parameters |
-- !query analysis | ||
org.apache.spark.sql.AnalysisException | ||
{ | ||
"errorClass" : "FAILED_FUNCTION_CALL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't properly handle NULLs in time expressions, at least by HoursOfTime
. I will recheck this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments Max!
StaticInvoke( | ||
classOf[DateTimeUtils.type], | ||
DecimalType(8, 6), | ||
"getSecondsOfTimeWithFraction", | ||
Seq(child, Literal(precision)), | ||
Seq(child.dataType, IntegerType)) | ||
} | ||
private val precision: Int = child.dataType.asInstanceOf[TimeType].precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the fix here in the PR because the bug is triggered when the expression is invoked from MakeTimestamp with NullType
directly (without checking inputTypes
).
@LuciferYang @yaooqinn @dongjoon-hyun Could you review this PR, please. |
What changes were proposed in this pull request?
In the PR, I propose to extend the
make_timestamp
function, and accept a date and time field + optional time zone.Syntax
Arguments
Returns
A TIMESTAMP.
Examples
Why are the changes needed?
Users will be able to create a timestamp by combining a time and a date.
Does this PR introduce any user-facing change?
No, it just extends the existing API.
How was this patch tested?
By running the affected test suites:
Was this patch authored or co-authored using generative AI tooling?
No.