Fix ConflictChecker for READ-APPEND after OPTIMIZE #1305

sezruby · 2022-08-01T07:24:19Z

Description

Allow read with newly added files with dataChange=false and removed files with dataChange=false by concurrent transactions.
dataChange=false files are created by OPTIMIZE operation, so the content should be same.

The following functions are to check both cases.

checkForAddedFilesThatShouldHaveBeenReadByCurrentTxn
checkForDeletedFilesAgainstCurrentTxnReadFiles

We can address the issue by

fix changedDataAddedFiles to check dataChange flag of AddFile
add changedDataRemovedFiles for RemoveFile

How was this patch tested?

Unit tests

Does this PR introduce any user-facing changes?

No, a bug fix when Optimize transaction is committed while concurrent Reads.

Signed-off-by: Eunjin Song <[email protected]>

tdas · 2022-08-09T22:22:53Z

Can you elaborate on the issue? Which operation throws ConcurrentAppendException? Optimize or Append? And can give an example error and stacktrace? Also, can you explain the semantics of why this change is correct and maintain serializability?

sezruby · 2022-09-08T03:44:59Z

@tdas Append operation throws ConcurrentAppendException.

io.delta.exceptions.ConcurrentAppendException: Files were added to partition [colC=1] by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1662599111087,"operation":"OPTIMIZE", ...

delta/core/src/main/scala/org/apache/spark/sql/delta/ConflictChecker.scala

Lines 177 to 188 in 2041c3b

    
             /** 
        
              * Check if the new files added by the already committed transactions should have been read by 
        
              * the current transaction. 
        
              */ 
        
             protected def checkForAddedFilesThatShouldHaveBeenReadByCurrentTxn(): Unit = { 
        
               recordTime("checked-appends") { 
        
                 // Fail if new files have been added that the txn should have read. 
        
                 val addedFilesToCheckForConflicts = isolationLevel match { 
        
                   case WriteSerializable if !currentTransactionInfo.metadataChanged => 
        
                     winningCommitSummary.changedDataAddedFiles // don't conflict with blind appends 
        
                   case Serializable | WriteSerializable => 
        
                     winningCommitSummary.changedDataAddedFiles ++ winningCommitSummary.blindAppendAddedFiles

This change makes optimized files are not conflicting with read operation for Append operation. Actually it's not conflicting with APPEND but with the scan.

This is the repro:

// session1 - performing OPTIMIZE
(1 to 10).foreach { i => 
  println(spark.sql(s"OPTIMIZE delta.`$dataPath`").collect.toSeq)
}

// session 2 - performing APPEND
(1 to 10).foreach { i => 

spark.read.format("delta").load(dataPath).limit(10).write.mode("append").format("delta").partitionBy("colC").save(dataPath)
}

tdas · 2022-09-12T17:53:02Z

@sezruby Thank you for the explanation, and the repro. So its a problem not with blind appends but read-then-append.

I have think about it, whether this change has other unintended consequence in other kind of workloads. Actually can you provide a logical argument why this change will not produce unintended consequence in other combination of operations .. like DELETE/UPDATE/MERGE + OPTIMIZE? I know that you have put some tests with delete, but tests can have coverage gaps. So it would good to have a logical convincing argument that this change is safe no matter what operation

sezruby · 2022-09-13T07:18:17Z

@tdas I found that there's still can be a conflict if concurrent transaction reads a RemoveFile from Optimize.

io.delta.exceptions.ConcurrentDeleteReadException: This transaction attempted to read one or more files that were deleted (for example colC=0/part-00000-cf1e16ab-27b2-4c9c-b5e1-bccfb0e79a58.c000.snappy.parquet in partition [colC=0]) by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1663052151972,"operation":"OPTIMIZE"

delta/core/src/main/scala/org/apache/spark/sql/delta/ConflictChecker.scala

Lines 226 to 235 in d2785aa

    
              * Check if [[RemoveFile]] actions added by already committed transactions conflicts with files 
        
              * read by the current transaction. 
        
              */ 
        
             protected def checkForDeletedFilesAgainstCurrentTxnReadFiles(): Unit = { 
        
               recordTime("checked-deletes") { 
        
                 // Fail if files have been deleted that the txn read. 
        
                 val readFilePaths = currentTransactionInfo.readFiles.map( 
        
                   f => f.path -> f.partitionValues).toMap 
        
                 val deleteReadOverlap = winningCommitSummary.removedFiles 
        
                   .find(r => readFilePaths.contains(r.path))

I think we can add changedDataRemovedFiles like changedDataAddedFiles and use it for the check function.
We could allow Read to the previous version before OPTIMIZE without any serialization issue if the transaction doesn't update the existing data. With the fix, we could support concurrent OPTIMZIE while performing append/insert only operations which utilizes existing data for appending new rows.

For other type of operation which may have RemoveFile (DELETE UPDATE MERGE OPTIMIZE), it will fail with the following check:

delta/core/src/main/scala/org/apache/spark/sql/delta/ConflictChecker.scala

Lines 250 to 266 in d2785aa

    
             /** 
        
              * Check if [[RemoveFile]] actions added by already committed transactions conflicts with 
        
              * [[RemoveFile]] actions this transaction is trying to add. 
        
              */ 
        
             protected def checkForDeletedFilesAgainstCurrentTxnDeletedFiles(): Unit = { 
        
               recordTime("checked-2x-deletes") { 
        
                 // Fail if a file is deleted twice. 
        
                 val txnDeletes = currentTransactionInfo.actions 
        
                   .collect { case r: RemoveFile => r } 
        
                   .map(_.path).toSet 
        
                 val deleteOverlap = winningCommitSummary.removedFiles.map(_.path).toSet intersect txnDeletes 
        
                 if (deleteOverlap.nonEmpty) { 
        
                   throw DeltaErrors.concurrentDeleteDeleteException( 
        
                     winningCommitSummary.commitInfo, deleteOverlap.head) 
        
                 } 
        
               } 
        
             }

Signed-off-by: Eunjin Song <[email protected]>

scottsand-db · 2022-09-15T19:13:51Z

Wouldn't this PR #1262 help with this?

sezruby · 2022-09-15T21:11:38Z

@scottsand-db Seems the PR allows to switch isolation level to WriteSerializable. Not sure any other changes will be delivered, but with the current code, the issue still exists:

delta/core/src/main/scala/org/apache/spark/sql/delta/ConflictChecker.scala

Lines 181 to 188 in d2785aa

    
           protected def checkForAddedFilesThatShouldHaveBeenReadByCurrentTxn(): Unit = { 
        
             recordTime("checked-appends") { 
        
               // Fail if new files have been added that the txn should have read. 
        
               val addedFilesToCheckForConflicts = isolationLevel match { 
        
                 case WriteSerializable if !currentTransactionInfo.metadataChanged => 
        
                   winningCommitSummary.changedDataAddedFiles // don't conflict with blind appends 
        
                 case Serializable | WriteSerializable => 
        
                   winningCommitSummary.changedDataAddedFiles ++ winningCommitSummary.blindAppendAddedFiles

it still checks changedDataAddedFiles with WriteSerializable.

Also with this change, we could allow concurrent read-append & optimize for Serializable level.

sezruby · 2022-09-21T04:06:14Z

@tdas Ready-For-Review

sezruby · 2022-09-30T17:14:34Z

Seems the PR fixes #326

scottsand-db

Looks great! Thanks for this change! tests look good.

Can you please update scaladoc for method checkForDeletedFilesAgainstCurrentTxnDeletedFiles explaining that we check for conflicts of remove files regardless of datachange status.