Skip to content

DB::Open() failed when upgrading rocksdb from version 6.x to 7.10.2 #13624

Open
@gitccl

Description

@gitccl

We encountered a backward compatibility issue when upgrading a production RocksDB instance from version 6.x to version 7.10.2. Specifically, RocksDB fails to open a database that previously ran with FIFO compaction under 6.x, and reports a corruption error during recovery:

DB::Open() failed: Corruption: VersionBuilder: Cannot delete table file #295307 from level 0 since it is not in the LSM tree in file /path/to/manifest

Upon inspecting the MANIFEST file mentioned in the error, we found that it contains multiple identical VersionEdit records, all attempting to delete the same file from the same level and column family:

VersionEdit {
  PrevLogNumber: 0
  NextFileNumber: 303821
  LastSeq: 26013262
  DeleteFile: 0 295307
  ColumnFamily: 1
}
VersionEdit {
  PrevLogNumber: 0
  NextFileNumber: 303821
  LastSeq: 26013262
  DeleteFile: 0 295307
  ColumnFamily: 1
}
...
(total 9 identical entries)

Starting from #6901, RocksDB introduced strict validation in the VersionBuilder, which now treats such duplicate file deletions as corruption. Notably, a similar issue was reported in #12619.

We further investigated the root cause of the invalid manifest and found that it was due to a bug in RocksDB v6's FIFO compaction logic. Under certain race conditions, multiple compaction threads may attempt to delete the same SST file concurrently. As a result, the same VersionEdit (deleting the same file) can be written multiple times into the MANIFEST. This bug has already been fixed in #5754, but that fix does not retroactively address existing MANIFEST files that were already corrupted before upgrading.

To restore backward compatibility and ensure such databases can still be opened, we propose adding a new configuration option in ColumnFamilyOptions to tolerate duplicate file deletions during DB recovery. We have verified through internal testing that this change resolves the open failures caused by inconsistent manifests with duplicated deletions.

We would like to contribute this fix upstream to improve RocksDB's robustness and backward compatibility guarantees.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions