Version vector related extensions to the recovery restart logic #12284

sbodagala · 2025-07-30T23:11:34Z

Version vector/Unicast specific: Restart recovery if the list of available tLogs change in such a way that the current in-progress recovery could stall.

More details: Let "listA" be the list of known locked tLogs of the current in-progress recovery. Suppose the list of known locked tLogs change and "listB" be the current list. If "listA" is not a subset of "listB" then there is a chance that the current in-progress recovery would stall, hence restart recovery (even if the recovery version hasn't changed).

We don't need to restart recovery in "main" in this case because the cursor logic tracks these changes (code:

foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp

Line 1155 in d45a17b

q.push_back(c->onFailed());

) and can reset "bestServer" after that (code:

foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp

Line 1120 in d45a17b

if (self->bestServer >= 0 && self->bestSet >= 0 &&

) (this is what I think happens in "main", I haven't verified it by going through a simulation test). We can't dynamically change "bestServer" when version vector/unicast is enabled (because we explicitly specify "returnEmptyIfStopped" flag based on whether "bestServer" is enabled or not at the time the cursor is built, code:

foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp

Line 898 in d45a17b

    
           ((SERVER_KNOBS->ENABLE_VERSION_VECTOR_TLOG_UNICAST && end != std::numeric_limits<Version>::max())

) and hence we need to restart recovery in this case.

Testing:
Joshua id (with version vector disabled): 20250730-232250-sre-7ba451e103463247 (started).

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

The PR has a description, explaining both the problem and the solution.
The description mentions which forms of testing were done and the testing seems reasonable.
Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

available tLogs change in such a way that the current in-progress recovery could stall.

foundationdb-ci · 2025-07-30T23:51:54Z

Result of foundationdb-pr-clang-ide on Linux RHEL 9

Commit ID: 6af5f13
Duration 0:40:09
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-07-31T00:00:15Z

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Commit ID: 6af5f13
Duration 0:48:27
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-07-31T00:24:22Z

Result of foundationdb-pr-clang on Linux RHEL 9

Commit ID: 6af5f13
Duration 1:12:39
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-07-31T00:26:32Z

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

Commit ID: 6af5f13
Duration 1:14:45
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci · 2025-07-31T00:27:24Z

Result of foundationdb-pr on Linux RHEL 9

Commit ID: 6af5f13
Duration 1:15:37
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

dlambrig · 2025-07-31T17:46:56Z

fdbserver/TagPartitionedLogSystem.actor.cpp

+		    maxEnd >= lastEnd.get()) {
+			// Are the locked servers that were available in the previous iteration still available? If not,
+			// restart recovery (as there is a chance that the recovery of the previous iteration would stall).
+			knownLockedTLogIdsChanged = !isSubset(lastKnownLockedTLogIds, currentKnownLockedTLogIds);


could add an event, something like:

if (knownLockedTLogIdsChanged) { TraceEvent("KnownLockedTLogsChanged") .detail("Last", describe(lastKnownLockedTLogIds)) .detail("Current", describe(currentKnownLockedTLogIds)); }

dlambrig · 2025-07-31T18:04:08Z

fdbserver/TagPartitionedLogSystem.actor.cpp

+static bool isSubset(const std::map<uint8_t, std::vector<uint16_t>>& mapA,
+                     const std::map<uint8_t, std::vector<uint16_t>>& mapB) {
+	for (const auto& [keyA, valueA] : mapA) {
+		if (mapB.find(keyA) == mapB.end()) {


nit: no need to lookup keyA twice, can do a single lookup.

auto it = mapB.find(keyA); if (it == mapB.end()) { return false; } const auto& valueB = it->second;

dlambrig · 2025-07-31T18:10:27Z

fdbserver/TagPartitionedLogSystem.actor.cpp

@@ -2657,7 +2686,8 @@ ACTOR Future<Void> TagPartitionedLogSystem::epochEnd(Reference<AsyncVar<Referenc
 			logSystem->logSystemType = prevState.logSystemType;
 			logSystem->rejoins = rejoins;
 			logSystem->lockResults = lockResults;
-			logSystem->knownLockedTLogIds = knownLockedTLogIds;
+			logSystem->knownLockedTLogIds = currentKnownLockedTLogIds;
+			lastKnownLockedTLogIds = std::move(currentKnownLockedTLogIds);


to save a future headache, add
currentKnownLockedTLogIds.clear(); // ensures safety if accessed later

dlambrig · 2025-07-31T18:18:13Z

fdbserver/TagPartitionedLogSystem.actor.cpp

+		if (mapB.find(keyA) == mapB.end()) {
+			return false;
+		}
+		const auto& valueB = mapB.at(keyA);


for clarity, and since this is not really a generic function, could rename keyA to locality (or log to be consistent with caller, but I think "log" is misleading).

mapA could become lastLockedTLogs and mapB could be newLockedTLogs, or something similar.

- Version vector/Unicast specific: Restart recovery if the list of

6af5f13

available tLogs change in such a way that the current in-progress recovery could stall.

sbodagala requested review from dlambrig and jzhou77 July 30, 2025 23:11

dlambrig reviewed Jul 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version vector related extensions to the recovery restart logic #12284

Version vector related extensions to the recovery restart logic #12284

Uh oh!

sbodagala commented Jul 30, 2025 •

edited

Loading

Uh oh!

foundationdb-ci commented Jul 30, 2025

Uh oh!

foundationdb-ci commented Jul 31, 2025

Uh oh!

foundationdb-ci commented Jul 31, 2025

Uh oh!

foundationdb-ci commented Jul 31, 2025

Uh oh!

foundationdb-ci commented Jul 31, 2025

Uh oh!

dlambrig Jul 31, 2025

Uh oh!

dlambrig Jul 31, 2025

Uh oh!

dlambrig Jul 31, 2025

Uh oh!

dlambrig Jul 31, 2025

Uh oh!

Uh oh!

Version vector related extensions to the recovery restart logic #12284

Are you sure you want to change the base?

Version vector related extensions to the recovery restart logic #12284

Uh oh!

Conversation

sbodagala commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code-Reviewer Section

For Release-Branches

Uh oh!

foundationdb-ci commented Jul 30, 2025

Result of foundationdb-pr-clang-ide on Linux RHEL 9

Uh oh!

foundationdb-ci commented Jul 31, 2025

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Uh oh!

foundationdb-ci commented Jul 31, 2025

Result of foundationdb-pr-clang on Linux RHEL 9

Uh oh!

foundationdb-ci commented Jul 31, 2025

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

Uh oh!

foundationdb-ci commented Jul 31, 2025

Result of foundationdb-pr on Linux RHEL 9

Uh oh!

dlambrig Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

dlambrig Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

dlambrig Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

dlambrig Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sbodagala commented Jul 30, 2025 •

edited

Loading