Skip to content

[WIP] add support for show partitions command #1051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

JassAbidi
Copy link
Contributor

Description

This PR adds support for SHOW PARTITIONS command to list the partition names of a delta table.

Resolves #996

How was this patch tested?

Tests to test this PR were added in ShowDeltaPartitionsSuite

Does this PR introduce any user-facing changes?

No

@JassAbidi
Copy link
Contributor Author

JassAbidi commented Apr 2, 2022

@zsxwing this PR is not finished yet (the partition spec is not supported yet). But I wanted to double-check the implementation with you since I also introduced some changes to the Describe Detail command to avoid duplicating code. Please let me know if there are changes that I need to do before tackling the partition spec work.

@allisonport-db allisonport-db self-assigned this Apr 4, 2022
Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a precursory look, let me know what you think about the suggested refactoring.

;

partitionVal
: identifier (EQ constant)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is (EQ constant) optional?

// Add keywords here so that people's queries don't break if they have a column name as one of
// these tokens
nonReserved
: VACUUM | RETAIN | HOURS | DRY | RUN
| CONVERT | TO | DELTA | PARTITIONED | BY
| DESC | DESCRIBE | LIMIT | DETAIL
| GENERATE | FOR | TABLE | CHECK | EXISTS | OPTIMIZE
| RESTORE | AS | OF
| RESTORE | AS | OF | SHOW | PARTITIONS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about PARTITION?

Comment on lines +33 to +34
* A command for users to list the partition names of a delta table. If partition spec is specified,
* partitions that match the spec are returned. Otherwise an empty result set is returned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all the partitions be returned if partition_spec isn't provided?

@@ -75,6 +75,8 @@ statement
: VACUUM (path=STRING | table=qualifiedName)
(RETAIN number HOURS)? (DRY RUN)? #vacuumTable
| (DESC | DESCRIBE) DETAIL (path=STRING | table=qualifiedName) #describeDeltaDetail
| SHOW PARTITIONS (path=STRING | table=qualifiedName)
partitionSpec? #showPartitions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe making this more like (PARTITIONED BY '(' colTypeList ')')? might be more readable? partitionSpec is a bit convoluted

Comment on lines +194 to +196
val path = Option(ctx.path).map(string)
val tableIdentifier = Option(ctx.table).map(visitTableIdentifier)
ShowDeltaPartitionsCommand(path, tableIdentifier)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it without the variables like the other commands?

}
}

abstract class DeltaCommandWithoutViewsSupport(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So first, I noticed DeltaCommand uses DeltaErrors.viewNotSupportedException(operationName), we should be able to use this error instead (and delete the command specific ones in DeltaErrors.)

I also noticed a lot of this code overlaps with DeltaCommand.getDeltaLog, which for some reason neither DescribeDeltaDetailsCommand nor DescribeDeltaHistoryCommand extends. In fact it seems to me DescribeDeltaHistoryCommand basically copies exactly getDeltaLog.

What I'm currently thinking needs to happen is...

  1. DescribeDeltaHistoryCommand should extend DeltaCommand and use DeltaCommand.getDeltaLog
  2. We should see if we can combine this code here (as used by DescribeDeltaDetailsCommand) with DeltaCommand.getDeltaLog to minimize overlap. I'm currently thinking something like ...
trait DeltaCommand {
   ...
   def getPathAndTableMetadata(...) {
      ...
   }
   def getDeltaLog(...) {
       path, metadata = getPathAndTableMetadata(...)
       // check if delta table and throw error if so
       ....
   }
  1. Do we need to support ShowDeltaPartitionsCommand for non-delta tables (@zsxwing I'm thinking no)? Right now we aren't checking if it's a Delta table before creating the DeltaLog. Assuming we only need to support Delta tables, we should then be able to use DeltaCommand.getDeltaLog here since both usages of getPathAndTableMetadata ignores tableMetadata and only uses basePath to create a DeltaLog.

(1) and (2) should be done in a separate PR. I think in this PR we can just revert the changes to DescribeDeltaDetailsCommand, and use DeltaCommand.getDeltaLog.

We should also update DeltaCommand.getDeltaLog to throw the more helpful error for temporary views, as we have done in DescribeDeltaDetailCommand (good reason for reducing code duplication 😄 ).

What do you think @zsxwing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK if you'd be willing to tackle 1&2 in a separate PR.

@YannByron
Copy link
Contributor

I am not sure if we can not support this sql command using path.
If so, we don't need to extend the antlr grammar and sql parser and can reuse the spark syntax directly. and make the DeltaTableV2 to extend the SupportsPartitionManagement interface.

@zsxwing
Copy link
Member

zsxwing commented Jul 5, 2022

@YannByron We have to create our own parser as we don't plan to follow Spark's current show partitions format. It's coming from legacy hive format and it's not user friendly. See #996 for the examples I put in the description.

@YannByron
Copy link
Contributor

@zsxwing thanks for you explanation. I see delta defines the new output format for show partitions and that's nice.

@scottsand-db
Copy link
Collaborator

scottsand-db commented Aug 26, 2022

@JassAbidi - any update? are you still interested in working on this?

@zsxwing
Copy link
Member

zsxwing commented Oct 11, 2022

@JassAbidi are you still working on this?

@nate-kuhl
Copy link

My team could certainly make use of this feature - is there anything I can do to help move it along ?

@allisonport-db
Copy link
Collaborator

@nate-kuhl This PR doesn't seem to be active, you're welcome to pick up this work instead. In my review above I outlined a refactoring I think should be done first before implementing this if you would like to work on it.

@nate-kuhl
Copy link

nate-kuhl commented Oct 28, 2022

@allisonport-db I can give it a shot ! I guess I should open a new PR ?

@allisonport-db
Copy link
Collaborator

allisonport-db commented Oct 28, 2022

@nate-kuhl since we need to do a refactoring (2 PRs) you should create a new PR. Feel free to open the PR early and ask on it if you have any questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support "SHOW PARTITIONS"
6 participants