Merging to release-5.9: [TT-9234] regression fixes for failing mdcb readiness check (#7215) #7220
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
User description
TT-9234 regression fixes for failing mdcb readiness check (#7215)
User description
Description
This PR fixes the health check logic for RPC components when MDCB
(Multi-Data Center Bridge) is operating in emergency mode, ensuring
proper failover behavior during RPC connectivity issues.
Problem
When MDCB enters emergency mode due to RPC connectivity issues, the
gateway was incorrectly marking RPC health check failures as critical,
causing the entire gateway to report as unhealthy. This prevented proper
failover operation where the gateway should continue serving requests
using cached policies.
Solution
Modified the isCriticalFailure() function in gateway/health_check.go to
consider RPC emergency mode status when determining if an RPC component
failure is critical.
Related Issue
Motivation and Context
How This Has Been Tested
Screenshots (if appropriate)
Types of changes
functionality to change)
coverage to functionality)
Checklist
why it's required
explained why
PR Type
Bug fix, Tests
Description
Fixes critical failure logic for RPC in emergency mode
Adds unit tests for RPC emergency mode scenarios
Updates test setup to handle emergency mode toggling
Ensures correct behavior for RPC health check failures
Changes diagram
Changes walkthrough 📝
health_check.go
Add emergency mode check to RPC critical failure logic
gateway/health_check.go
health_check_test.go
Add and update tests for RPC emergency mode logic
gateway/health_check_test.go
...
in the comments thread for any questions about PR-Agentusage.
for more information.
PR Type
Bug fix, Tests
Description
Fixes RPC critical failure logic to respect emergency mode
Adds unit tests for RPC emergency mode health check scenarios
Updates test setup to toggle RPC emergency mode as needed
Ensures gateway remains healthy in RPC emergency mode
Changes diagram
Changes walkthrough 📝
health_check.go
Add emergency mode check to RPC critical failure logic
gateway/health_check.go
health_check_test.go
Add and update tests for RPC emergency mode logic
gateway/health_check_test.go