-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Bug
Describe the problem
I'm currently populating a Hive metastore with Delta tables created by Databricks.
These tables have table properties, including Databricks-specific properties such as delta.autoOptimize.autoCompact
, which can be ignored via the spark.databricks.delta.allowArbitraryProperties.enabled
configuration since Delta 2.0.0. So far so good!
Unfortunately, when trying to create the table in the metastore, I get the following error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.
== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false
== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false
Due to the Delta code thinking the properties are case-insensitive, but grabbing the existing properties as a straightforward Map and comparing the two in a case-sensitive fashion.
Steps to reproduce
- Create a Databricks Delta table with table properties including upper case letters
- Try to read them with open-source Delta
Observed results
Failing due to case sensitivity on the table properties.
Expected results
Not failing :)
Further details
Full traceback:
You are setting a property: delta.autooptimize.autocompact that is not recognized by this version of Delta
Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.
== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false
== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false
at org.apache.spark.sql.delta.DeltaAnalysisException$.apply(DeltaSharedExceptions.scala:57)
at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException(DeltaErrors.scala:1161)
at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException$(DeltaErrors.scala:1157)
at org.apache.spark.sql.delta.DeltaErrors$.createTableWithDifferentPropertiesException(DeltaErrors.scala:2264)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.verifyTableMetadata(CreateDeltaTableCommand.scala:324)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.createTransactionLogOrVerify$1(CreateDeltaTableCommand.scala:186)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.$anonfun$run$2(CreateDeltaTableCommand.scala:193)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordFrameProfile(CreateDeltaTableCommand.scala:49)
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:134)
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordOperation(CreateDeltaTableCommand.scala:49)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:133)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:123)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:111)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordDeltaOperation(CreateDeltaTableCommand.scala:49)
at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.run(CreateDeltaTableCommand.scala:110)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.org$apache$spark$sql$delta$catalog$DeltaCatalog$$createDeltaTable(DeltaCatalog.scala:163)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.createTable(DeltaCatalog.scala:212)
at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:42)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:133)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:357)
at com.redacted.spark.SparkSessionHelpersKt.createDeltaTable(SparkSessionHelpers.kt:57)
at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:80)
at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:40)
at com.redacted.spark.jobs.configuration.CliktRun.run(Configuration.kt:49)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:198)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:18)
at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:395)
at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:392)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:410)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:435)
at com.redacted.spark.jobs.schema.MainKt.main(Main.kt:21)
Environment information
- Delta Lake version: 2.0.0
- Spark version: 3.2.1
- Scala version: 2.12
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
- Yes. I can contribute a fix for this bug independently.
- Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
- No. I cannot contribute a bug fix at this time.
(PR coming!)