-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[WIP] add support for show partitions command #1051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[SC-24892] Add typesafe bintray repo for sbt-mima-plugin
update fork
catch up to master
update master
update with master
update fork branch
update fork
update fork
update fork
fork update
update fork
update fork
update fork
update fork
sync with master
add more unit tests
@zsxwing this PR is not finished yet (the partition spec is not supported yet). But I wanted to double-check the implementation with you since I also introduced some changes to the Describe Detail command to avoid duplicating code. Please let me know if there are changes that I need to do before tackling the partition spec work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a precursory look, let me know what you think about the suggested refactoring.
; | ||
|
||
partitionVal | ||
: identifier (EQ constant)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is (EQ constant)
optional?
// Add keywords here so that people's queries don't break if they have a column name as one of | ||
// these tokens | ||
nonReserved | ||
: VACUUM | RETAIN | HOURS | DRY | RUN | ||
| CONVERT | TO | DELTA | PARTITIONED | BY | ||
| DESC | DESCRIBE | LIMIT | DETAIL | ||
| GENERATE | FOR | TABLE | CHECK | EXISTS | OPTIMIZE | ||
| RESTORE | AS | OF | ||
| RESTORE | AS | OF | SHOW | PARTITIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about PARTITION
?
* A command for users to list the partition names of a delta table. If partition spec is specified, | ||
* partitions that match the spec are returned. Otherwise an empty result set is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all the partitions be returned if partition_spec
isn't provided?
@@ -75,6 +75,8 @@ statement | |||
: VACUUM (path=STRING | table=qualifiedName) | |||
(RETAIN number HOURS)? (DRY RUN)? #vacuumTable | |||
| (DESC | DESCRIBE) DETAIL (path=STRING | table=qualifiedName) #describeDeltaDetail | |||
| SHOW PARTITIONS (path=STRING | table=qualifiedName) | |||
partitionSpec? #showPartitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe making this more like (PARTITIONED BY '(' colTypeList ')')?
might be more readable? partitionSpec
is a bit convoluted
val path = Option(ctx.path).map(string) | ||
val tableIdentifier = Option(ctx.table).map(visitTableIdentifier) | ||
ShowDeltaPartitionsCommand(path, tableIdentifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it without the variables like the other commands?
} | ||
} | ||
|
||
abstract class DeltaCommandWithoutViewsSupport( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So first, I noticed DeltaCommand
uses DeltaErrors.viewNotSupportedException(operationName)
, we should be able to use this error instead (and delete the command specific ones in DeltaErrors
.)
I also noticed a lot of this code overlaps with DeltaCommand.getDeltaLog
, which for some reason neither DescribeDeltaDetailsCommand
nor DescribeDeltaHistoryCommand
extends. In fact it seems to me DescribeDeltaHistoryCommand
basically copies exactly getDeltaLog
.
What I'm currently thinking needs to happen is...
DescribeDeltaHistoryCommand
should extendDeltaCommand
and useDeltaCommand.getDeltaLog
- We should see if we can combine this code here (as used by
DescribeDeltaDetailsCommand
) withDeltaCommand.getDeltaLog
to minimize overlap. I'm currently thinking something like ...
trait DeltaCommand {
...
def getPathAndTableMetadata(...) {
...
}
def getDeltaLog(...) {
path, metadata = getPathAndTableMetadata(...)
// check if delta table and throw error if so
....
}
- Do we need to support
ShowDeltaPartitionsCommand
for non-delta tables (@zsxwing I'm thinking no)? Right now we aren't checking if it's a Delta table before creating theDeltaLog
. Assuming we only need to support Delta tables, we should then be able to useDeltaCommand.getDeltaLog
here since both usages ofgetPathAndTableMetadata
ignorestableMetadata
and only usesbasePath
to create aDeltaLog
.
(1) and (2) should be done in a separate PR. I think in this PR we can just revert the changes to DescribeDeltaDetailsCommand
, and use DeltaCommand.getDeltaLog
.
We should also update DeltaCommand.getDeltaLog
to throw the more helpful error for temporary views, as we have done in DescribeDeltaDetailCommand
(good reason for reducing code duplication 😄 ).
What do you think @zsxwing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK if you'd be willing to tackle 1&2 in a separate PR.
I am not sure if we can not support this sql command using |
@YannByron We have to create our own parser as we don't plan to follow Spark's current |
@zsxwing thanks for you explanation. I see delta defines the new output format for |
@JassAbidi - any update? are you still interested in working on this? |
@JassAbidi are you still working on this? |
My team could certainly make use of this feature - is there anything I can do to help move it along ? |
@nate-kuhl This PR doesn't seem to be active, you're welcome to pick up this work instead. In my review above I outlined a refactoring I think should be done first before implementing this if you would like to work on it. |
@allisonport-db I can give it a shot ! I guess I should open a new PR ? |
@nate-kuhl since we need to do a refactoring (2 PRs) you should create a new PR. Feel free to open the PR early and ask on it if you have any questions |
Description
This PR adds support for SHOW PARTITIONS command to list the partition names of a delta table.
Resolves #996
How was this patch tested?
Tests to test this PR were added in
ShowDeltaPartitionsSuite
Does this PR introduce any user-facing changes?
No