Skip to content

Merging to release-5.9: [TT-9234] regression fixes for failing mdcb readiness check (#7215) #7220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

buger
Copy link
Member

@buger buger commented Jul 10, 2025

User description

TT-9234 regression fixes for failing mdcb readiness check (#7215)

User description

Description

This PR fixes the health check logic for RPC components when MDCB
(Multi-Data Center Bridge) is operating in emergency mode, ensuring
proper failover behavior during RPC connectivity issues.

Problem

When MDCB enters emergency mode due to RPC connectivity issues, the
gateway was incorrectly marking RPC health check failures as critical,
causing the entire gateway to report as unhealthy. This prevented proper
failover operation where the gateway should continue serving requests
using cached policies.

Solution

Modified the isCriticalFailure() function in gateway/health_check.go to
consider RPC emergency mode status when determining if an RPC component
failure is critical.

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing
    functionality to change)
  • Refactoring or add test (improvements in base code or adds test
    coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning
    why it's required
  • I would like a code coverage CI quality gate exception and have
    explained why

PR Type

Bug fix, Tests


Description

  • Fixes critical failure logic for RPC in emergency mode

  • Adds unit tests for RPC emergency mode scenarios

  • Updates test setup to handle emergency mode toggling

  • Ensures correct behavior for RPC health check failures


Changes diagram

flowchart LR
  A["isCriticalFailure logic"] -- "add emergency mode check" --> B["RPC component handling"]
  B -- "unit tests for emergency mode" --> C["health_check_test.go"]
Loading

Changes walkthrough 📝

Relevant files
Bug fix
health_check.go
Add emergency mode check to RPC critical failure logic     

gateway/health_check.go

  • Adds emergency mode check to RPC critical failure logic
  • Ensures RPC is not critical in emergency mode
  • +2/-2     
    Tests
    health_check_test.go
    Add and update tests for RPC emergency mode logic               

    gateway/health_check_test.go

  • Adds tests for RPC critical failure in emergency mode
  • Updates test cases to toggle emergency mode
  • Imports RPC package for emergency mode control
  • +38/-0   

    Need help?
  • Type /help how to
  • ... in the comments thread for any questions about PR-Agent
    usage.

  • Check out the documentation
    for more information.

  • PR Type

    Bug fix, Tests


    Description

    • Fixes RPC critical failure logic to respect emergency mode

    • Adds unit tests for RPC emergency mode health check scenarios

    • Updates test setup to toggle RPC emergency mode as needed

    • Ensures gateway remains healthy in RPC emergency mode


    Changes diagram

    flowchart LR
      A["isCriticalFailure logic"] -- "add emergency mode check" --> B["RPC component handling"]
      B -- "unit tests for emergency mode" --> C["health_check_test.go"]
    
    Loading

    Changes walkthrough 📝

    Relevant files
    Bug fix
    health_check.go
    Add emergency mode check to RPC critical failure logic     

    gateway/health_check.go

  • Adds emergency mode check to RPC critical failure logic
  • Ensures RPC failures are not critical in emergency mode
  • +2/-2     
    Tests
    health_check_test.go
    Add and update tests for RPC emergency mode logic               

    gateway/health_check_test.go

  • Adds unit tests for RPC emergency mode logic
  • Updates test cases to toggle emergency mode
  • Imports RPC package for emergency mode control
  • +38/-0   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • ### **User description**
    <!-- Provide a general summary of your changes in the Title above -->
    
    ## Description
    
    This PR fixes the health check logic for RPC components when MDCB
    (Multi-Data Center Bridge) is operating in emergency mode, ensuring
    proper failover behavior during RPC connectivity issues.
    
    ## Problem
    When MDCB enters emergency mode due to RPC connectivity issues, the
    gateway was incorrectly marking RPC health check failures as critical,
    causing the entire gateway to report as unhealthy. This prevented proper
    failover operation where the gateway should continue serving requests
    using cached policies.
    ## Solution
    Modified the isCriticalFailure() function in gateway/health_check.go to
    consider RPC emergency mode status when determining if an RPC component
    failure is critical.
    <!-- Describe your changes in detail -->
    
    ## Related Issue
    
    <!-- This project only accepts pull requests related to open issues. -->
    <!-- If suggesting a new feature or change, please discuss it in an
    issue first. -->
    <!-- If fixing a bug, there should be an issue describing it with steps
    to reproduce. -->
    <!-- OSS: Please link to the issue here. Tyk: please create/link the
    JIRA ticket. -->
    
    ## Motivation and Context
    
    <!-- Why is this change required? What problem does it solve? -->
    
    ## How This Has Been Tested
    
    <!-- Please describe in detail how you tested your changes -->
    <!-- Include details of your testing environment, and the tests -->
    <!-- you ran to see how your change affects other areas of the code,
    etc. -->
    <!-- This information is helpful for reviewers and QA. -->
    
    ## Screenshots (if appropriate)
    
    ## Types of changes
    
    <!-- What types of changes does your code introduce? Put an `x` in all
    the boxes that apply: -->
    
    - [ ] Bug fix (non-breaking change which fixes an issue)
    - [ ] New feature (non-breaking change which adds functionality)
    - [ ] Breaking change (fix or feature that would cause existing
    functionality to change)
    - [ ] Refactoring or add test (improvements in base code or adds test
    coverage to functionality)
    
    ## Checklist
    
    <!-- Go over all the following points, and put an `x` in all the boxes
    that apply -->
    <!-- If there are no documentation updates required, mark the item as
    checked. -->
    <!-- Raise up any additional concerns not covered by the checklist. -->
    
    - [ ] I ensured that the documentation is up to date
    - [ ] I explained why this PR updates go.mod in detail with reasoning
    why it's required
    - [ ] I would like a code coverage CI quality gate exception and have
    explained why
    
    
    ___
    
    ### **PR Type**
    Bug fix, Tests
    
    
    ___
    
    ### **Description**
    - Fixes critical failure logic for RPC in emergency mode
    
    - Adds unit tests for RPC emergency mode scenarios
    
    - Updates test setup to handle emergency mode toggling
    
    - Ensures correct behavior for RPC health check failures
    
    
    ___
    
    ### **Changes diagram**
    
    ```mermaid
    flowchart LR
      A["isCriticalFailure logic"] -- "add emergency mode check" --> B["RPC component handling"]
      B -- "unit tests for emergency mode" --> C["health_check_test.go"]
    ```
    
    
    ___
    
    
    
    ### **Changes walkthrough** 📝
    <table><thead><tr><th></th><th align="left">Relevant
    files</th></tr></thead><tbody><tr><td><strong>Bug
    fix</strong></td><td><table>
    <tr>
      <td>
        <details>
    <summary><strong>health_check.go</strong><dd><code>Add emergency mode
    check to RPC critical failure logic</code>&nbsp; &nbsp; &nbsp;
    </dd></summary>
    <hr>
    
    gateway/health_check.go
    
    <li>Adds emergency mode check to RPC critical failure logic<br> <li>
    Ensures RPC is not critical in emergency mode
    
    
    </details>
    
    
      </td>
    <td><a
    href="https://github.com/TykTechnologies/tyk/pull/7215/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+2/-2</a>&nbsp;
    &nbsp; &nbsp; </td>
    
    </tr>
    </table></td></tr><tr><td><strong>Tests</strong></td><td><table>
    <tr>
      <td>
        <details>
    <summary><strong>health_check_test.go</strong><dd><code>Add and update
    tests for RPC emergency mode logic</code>&nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
    <hr>
    
    gateway/health_check_test.go
    
    <li>Adds tests for RPC critical failure in emergency mode<br> <li>
    Updates test cases to toggle emergency mode<br> <li> Imports RPC package
    for emergency mode control
    
    
    </details>
    
    
      </td>
    <td><a
    href="https://github.com/TykTechnologies/tyk/pull/7215/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+38/-0</a>&nbsp;
    &nbsp; </td>
    
    </tr>
    </table></td></tr></tr></tbody></table>
    
    ___
    
    > <details> <summary> Need help?</summary><li>Type <code>/help how to
    ...</code> in the comments thread for any questions about PR-Agent
    usage.</li><li>Check out the <a
    href="https://qodo-merge-docs.qodo.ai/usage-guide/">documentation</a>
    for more information.</li></details>
    
    (cherry picked from commit a564981)
    Copy link
    Contributor

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    Copy link
    Contributor

    API Changes

    no api changes detected

    Copy link

    @andrei-tyk andrei-tyk merged commit c8285eb into release-5.9 Jul 10, 2025
    37 of 40 checks passed
    @andrei-tyk andrei-tyk deleted the merge/release-5.9/a564981b4d2eb0940f03313424c51d38572d4d39 branch July 10, 2025 08:00
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants