Skip to content

Version vector related extensions to the recovery restart logic #12284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sbodagala
Copy link
Contributor

@sbodagala sbodagala commented Jul 30, 2025

Version vector/Unicast specific: Restart recovery if the list of available tLogs change in such a way that the current in-progress recovery could stall.

More details: Let "listA" be the list of known locked tLogs of the current in-progress recovery. Suppose the list of known locked tLogs change and "listB" be the current list. If "listA" is not a subset of "listB" then there is a chance that the current in-progress recovery would stall, hence restart recovery (even if the recovery version hasn't changed).

We don't need to restart recovery in "main" in this case because the cursor logic tracks these changes (code:

q.push_back(c->onFailed());
) and can reset "bestServer" after that (code:
if (self->bestServer >= 0 && self->bestSet >= 0 &&
) (this is what I think happens in "main", I haven't verified it by going through a simulation test). We can't dynamically change "bestServer" when version vector/unicast is enabled (because we explicitly specify "returnEmptyIfStopped" flag based on whether "bestServer" is enabled or not at the time the cursor is built, code:
((SERVER_KNOBS->ENABLE_VERSION_VECTOR_TLOG_UNICAST && end != std::numeric_limits<Version>::max())
) and hence we need to restart recovery in this case.

Testing:
Joshua id (with version vector disabled): 20250730-232250-sre-7ba451e103463247 (started).

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

available tLogs change in such a way that the current in-progress
recovery could stall.
@sbodagala sbodagala requested review from dlambrig and jzhou77 July 30, 2025 23:11
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 6af5f13
  • Duration 0:40:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 6af5f13
  • Duration 0:48:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 6af5f13
  • Duration 1:12:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 6af5f13
  • Duration 1:14:45
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 6af5f13
  • Duration 1:15:37
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

maxEnd >= lastEnd.get()) {
// Are the locked servers that were available in the previous iteration still available? If not,
// restart recovery (as there is a chance that the recovery of the previous iteration would stall).
knownLockedTLogIdsChanged = !isSubset(lastKnownLockedTLogIds, currentKnownLockedTLogIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add an event, something like:

if (knownLockedTLogIdsChanged) {
    TraceEvent("KnownLockedTLogsChanged")
        .detail("Last", describe(lastKnownLockedTLogIds))
        .detail("Current", describe(currentKnownLockedTLogIds));
}

static bool isSubset(const std::map<uint8_t, std::vector<uint16_t>>& mapA,
const std::map<uint8_t, std::vector<uint16_t>>& mapB) {
for (const auto& [keyA, valueA] : mapA) {
if (mapB.find(keyA) == mapB.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to lookup keyA twice, can do a single lookup.

auto it = mapB.find(keyA);
if (it == mapB.end()) {
    return false;
}
const auto& valueB = it->second;

@@ -2657,7 +2686,8 @@ ACTOR Future<Void> TagPartitionedLogSystem::epochEnd(Reference<AsyncVar<Referenc
logSystem->logSystemType = prevState.logSystemType;
logSystem->rejoins = rejoins;
logSystem->lockResults = lockResults;
logSystem->knownLockedTLogIds = knownLockedTLogIds;
logSystem->knownLockedTLogIds = currentKnownLockedTLogIds;
lastKnownLockedTLogIds = std::move(currentKnownLockedTLogIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to save a future headache, add
currentKnownLockedTLogIds.clear(); // ensures safety if accessed later

if (mapB.find(keyA) == mapB.end()) {
return false;
}
const auto& valueB = mapB.at(keyA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity, and since this is not really a generic function, could rename keyA to locality (or log to be consistent with caller, but I think "log" is misleading).

mapA could become lastLockedTLogs and mapB could be newLockedTLogs, or something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants