Skip to content

[BUG] Delta table properties should be case-insensitive #1291

@Pluies

Description

@Pluies

Bug

Describe the problem

I'm currently populating a Hive metastore with Delta tables created by Databricks.

These tables have table properties, including Databricks-specific properties such as delta.autoOptimize.autoCompact, which can be ignored via the spark.databricks.delta.allowArbitraryProperties.enabled configuration since Delta 2.0.0. So far so good!

Unfortunately, when trying to create the table in the metastore, I get the following error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.

== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false

== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false

Due to the Delta code thinking the properties are case-insensitive, but grabbing the existing properties as a straightforward Map and comparing the two in a case-sensitive fashion.

Steps to reproduce

  1. Create a Databricks Delta table with table properties including upper case letters
  2. Try to read them with open-source Delta

Observed results

Failing due to case sensitivity on the table properties.

Expected results

Not failing :)

Further details

Full traceback:

You are setting a property: delta.autooptimize.autocompact that is not recognized by this version of Delta
Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.

== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false

== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false

	at org.apache.spark.sql.delta.DeltaAnalysisException$.apply(DeltaSharedExceptions.scala:57)
	at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException(DeltaErrors.scala:1161)
	at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException$(DeltaErrors.scala:1157)
	at org.apache.spark.sql.delta.DeltaErrors$.createTableWithDifferentPropertiesException(DeltaErrors.scala:2264)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.verifyTableMetadata(CreateDeltaTableCommand.scala:324)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.createTransactionLogOrVerify$1(CreateDeltaTableCommand.scala:186)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.$anonfun$run$2(CreateDeltaTableCommand.scala:193)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordFrameProfile(CreateDeltaTableCommand.scala:49)
	at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:134)
	at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
	at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordOperation(CreateDeltaTableCommand.scala:49)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:133)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:123)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:111)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordDeltaOperation(CreateDeltaTableCommand.scala:49)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.run(CreateDeltaTableCommand.scala:110)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.org$apache$spark$sql$delta$catalog$DeltaCatalog$$createDeltaTable(DeltaCatalog.scala:163)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.createTable(DeltaCatalog.scala:212)
	at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:42)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:133)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:357)
	at com.redacted.spark.SparkSessionHelpersKt.createDeltaTable(SparkSessionHelpers.kt:57)
	at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:80)
	at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:40)
	at com.redacted.spark.jobs.configuration.CliktRun.run(Configuration.kt:49)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:198)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:18)
	at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:395)
	at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:392)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:410)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:435)
	at com.redacted.spark.jobs.schema.MainKt.main(Main.kt:21)

Environment information

  • Delta Lake version: 2.0.0
  • Spark version: 3.2.1
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.

(PR coming!)

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions