Skip to content

Issue #809: Fix Creating an external table options #820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nflucian
Copy link

@nflucian nflucian commented Nov 1, 2021

This PR fix metadata.configuration inconsistency between write into delta and create delta table. #809

@allisonport-db
Copy link
Collaborator

Hi Lucian, thank you for contributing and submitting this PR. This is definitely a bug we want to fix and we are reviewing it now.

@zsxwing
Copy link
Member

zsxwing commented Nov 17, 2021

Thanks for making the PR. metadata.configuration is supposed to store the table properties rather than options. Currently we forget to remove options from table properties here.

Could you just change the above place to split table properties and options properly? Similar to

val props = new util.HashMap[String, String]()
// Options passed in through the SQL API will show up both with an "option." prefix and
// without in Spark 3.1, so we need to remove those from the properties
val optionsThroughProperties = properties.asScala.collect {
case (k, _) if k.startsWith("option.") => k.stripPrefix("option.")
}.toSet
val sqlWriteOptions = new util.HashMap[String, String]()
properties.asScala.foreach { case (k, v) =>
if (!k.startsWith("option.") && !optionsThroughProperties.contains(k)) {
// Do not add to properties
props.put(k, v)
} else if (optionsThroughProperties.contains(k)) {
sqlWriteOptions.put(k, v)
}
}

@nflucian nflucian force-pushed the fix/create-table-options branch 4 times, most recently from d28b58b to 59b4293 Compare November 25, 2021 13:23
Comment on lines 359 to 366
val props = if (conf.getConf(DeltaSQLConf.DELTA_LEGACY_STORE_WRITER_OPTIONS_AS_PROPS)) {
// Legacy behavior
writeOptions.foreach { case (k, v) => props.put(k, v) }
catalogProp.properties ++ writeOptions
} else {
writeOptions.foreach { case (k, v) =>
// Continue putting in Delta prefixed options to avoid breaking workloads
if (k.toLowerCase(Locale.ROOT).startsWith("delta.")) {
props.put(k, v)
}
}
val deltaPrefix: Map[String, String] = writeOptions.filter { case (k, _) =>
k.toLowerCase(Locale.ROOT).startsWith("delta.") }

catalogProp.properties ++ deltaPrefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be added to DeltaTableProperties. We would like to be consistent in all places.

@nflucian nflucian force-pushed the fix/create-table-options branch from 59b4293 to 76ed376 Compare December 7, 2021 11:32
@nflucian nflucian requested a review from zsxwing December 7, 2021 14:10
@zsxwing
Copy link
Member

zsxwing commented Dec 20, 2021

Sorry for being late. Although the change looks good, I will need to take some time to think about the backward compatibility.

@allisonport-db allisonport-db removed their assignment May 3, 2022
@zsxwing
Copy link
Member

zsxwing commented May 4, 2022

Sorry for the delay. I'm currently working on adding a new test suite to check the existing behaviors in all APIs that we can set options.

@scottsand-db scottsand-db self-requested a review May 4, 2022 19:20
@zsxwing
Copy link
Member

zsxwing commented May 6, 2022

Just to give a update. @scottsand-db is working on adding tests for the following complicated test matrix to check what are the behaviors in all combinations (Unfortunately, we have too many ways to create/write a Delta table today and they are not sharing the same code path):

Two types of options:

  • Options starting with delta.
  • Options not starting with delta.

Two ways to refer a Delta table

  • Path
  • Table name

APIs to write a Delta table

  • DataFrameWriter save/saveAsTable

  • DataStreamWriter (streaming) start/table

  • Dataset.writeStream

  • DataFrameWriterV2

  • DeltaTable.create

  • DeltaTable.replace

  • DeltaTable.createOrReplace

  • SQL

    • CREATE TABLE … USING DELTA TBLPROPERTIES(...)
    • CREATE TABLE … USING DELTA OPTIONS(...)
    • CREATE TABLE … USING DELTA OPTIONS(...) TBLPROPERTIES()
    • REPLACE
    • CREATE OR REPLACE
    • CREATE TABLE … AS SELECT * FROM XXX
  • Different table states:

    • Create a new table
    • Overwrite an existing table
    • Append to an existing table

We want to have these tests first so that we can make sure any new changes will be consistent in all of these cases.

And thanks a lot, @scottsand-db for offering the help to do this.

@tdas tdas removed the request for review from zsxwing May 6, 2022 19:16
@scottsand-db
Copy link
Collaborator

Here is the test suite which shows all the different ways to write configs to a delta table.
#1120

@scottsand-db
Copy link
Collaborator

@nflucian just FYI we are still working on merging #1120 first before this PR.

@scottsand-db
Copy link
Collaborator

Hi @nflucian, #1120 has been merged. It adds a test suite that shows the status of all the different combinations of writing configs into delta.

Can you update your branch with master? Then, could you please run `build/sbt 'core/testOnly *DeltaWriteConfigsSuite' and double check that your changes solve the problem? Can you then please update the relevant comment tables (e.g. here) with the latest result.

Let me know if you have any questions?

@scottsand-db scottsand-db added the enhancement New feature or request label May 20, 2022
Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. There's also 3 other big changes this PR needs.

  1. We should only write options that are prefixed with delta. BUT we also need to be able to read old tables that do have options that aren't prefixed with delta. (Backwards compatibility)
  2. We need a feature flag, which when enabled rolls back to the previous (i.e. current) behaviour.
  3. A unit test that tests this backwards compatibility. See EvolvabilitySuite

new DeltaTableProperties(tableProperties, writeOptions)
}

private def deltaProperties(writeOptions: Map[String, String]): Map[String, String] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaladoc. describe the input and the output

new DeltaTableProperties(tableProperties, writeOptions)
}

private def deltaProperties(writeOptions: Map[String, String]): Map[String, String] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaladoc. describe the input and the output

}
}

private def split(properties: Map[String, String]): (Map[String, String], Map[String, String]) = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaladoc: describe the output. what are the tuples?

}
}

private def split(properties: Map[String, String]): (Map[String, String], Map[String, String]) = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaladoc: describe the output. what are the tuples?

withTempDir { tempDir =>
val path = tempDir.getCanonicalPath
val options = Map(
"header"-> "true",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After syncing with @zsxwing , we've confirmed that we do not want to write properties that are not prefixed with delta. Thus, to be clear, we only want to write to table metadata those properties which start with delta.

withTempDir { tempDir =>
val path = tempDir.getCanonicalPath
val options = Map(
"header"-> "true",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After syncing with @zsxwing , we've confirmed that we do not want to write properties that are not prefixed with delta. Thus, to be clear, we only want to write to table metadata those properties which start with delta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants