[TT-15507] Revert changes to the "/hello" health check endpoint #7295

mativm02 · 2025-08-11T12:39:04Z

User description

The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status.
Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic.
Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged.

Summary	Revert changes to the "/hello" health check endpoint
Type	Bug
Status	In Dev
Points	N/A
Labels	-

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

I ensured that the documentation is up to date
I explained why this PR updates go.mod in detail with reasoning why it's required
I would like a code coverage CI quality gate exception and have explained why

PR Type

Bug fix, Tests

Description

Restore /hello to always return 200
Compute Pass/Warn/Fail from check counts
Add extensive liveness vs readiness tests
Improve JSON encode error handling logs

Diagram Walkthrough

flowchart LR
  liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"]
  liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"]
  readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"]
  tests["health_check_test.go"] -- "cover scenarios" --> liveH
  tests -- "compare vs readiness" --> readyH

File Walkthrough

Relevant files

Bug fix

health_check.go `Rework liveness logic and enforce 200 response` gateway/health_check.go Compute failCount inline and set status. Always write HTTP 200 for /hello. Add error handling for JSON encode in readiness. Remove determineHealthStatus usage in liveness.	+24/-4

Tests

health_check_test.go `Add comprehensive liveness and readiness tests` gateway/health_check_test.go Add tests for /hello status and body. Add liveness vs readiness behavior tests. Add edge case coverage and headers checks. Validate JSON payload fields and errors.	+552/-0

buger · 2025-08-11T12:39:16Z

I'm a bot and I 👍 this PR title. 🤖

github-actions · 2025-08-11T12:40:01Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The liveness status computation only counts items with status equal to `Fail`. If `checks` can be nil or empty, ensure `len(checks)` behaves as expected and that statuses like `Warn` should indeed be ignored for fail counting. This changes behavior from prior logic; verify this matches intended revert. for _, v := range checks { if v.Status == Fail { failCount++ } } var status HealthCheckStatus switch failCount { case 0: status = Pass case len(checks): status = Fail default: status = Warn } HTTP Status Change `liveCheckHandler` now always writes `http.StatusOK` instead of the previously computed `httpStatus`. Confirm no external consumers rely on non-200 codes for liveness, and that monitoring/alerts are adjusted accordingly. w.WriteHeader(http.StatusOK) err := json.NewEncoder(w).Encode(res) Error Handling Readiness handler now logs JSON encoding errors but still returns 200 after calling `WriteHeader(http.StatusOK)`. Consider whether a failed encode should result in a non-200 or at least avoid writing partial/invalid bodies. w.WriteHeader(http.StatusOK) err := json.NewEncoder(w).Encode(res) if err != nil { mainLog.Warning(fmt.Sprintf("[Readiness] Could not encode response, error: %s", err.Error())) } }

github-actions · 2025-08-11T12:40:20Z

API Changes

no api changes detected

github-actions · 2025-08-11T12:40:32Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Handle empty checks map When `checks` is nil, `len(checks)` is 0 and the `case len(checks)` branch will match for `failCount == 0`, incorrectly returning `Fail`. Guard the zero-checks scenario explicitly before the switch. This ensures empty or nil health data returns `Pass` as intended by tests. gateway/health_check.go [200-209] -switch failCount { -case 0: +if len(checks) == 0 { status = Pass - -case len(checks): - status = Fail - -default: - status = Warn +} else { + switch failCount { + case 0: + status = Pass + case len(checks): + status = Fail + default: + status = Warn + } } Suggestion importance[1-10]: 9 __ Why: Correctly identifies that when `checks` is nil/empty, `len(checks)==0` makes `case len(checks)` match and incorrectly set `status` to `Fail`; adding an explicit empty-check prevents this and aligns with tests expecting `Pass`.	High
General	Recover from JSON encode failure If JSON encoding fails after writing the status, the handler silently logs but leaves a partial body. To prevent clients from caching a bad body, clear the header Content-Length and write a minimal error message. This improves robustness of the endpoint response. gateway/health_check.go [220-224] w.WriteHeader(http.StatusOK) -err := json.NewEncoder(w).Encode(res) -if err != nil { +if err := json.NewEncoder(w).Encode(res); err != nil { mainLog.Warning(fmt.Sprintf("[Liveness] Could not encode response, error: %s", err.Error())) + w.Header().Del("Content-Length") + _, _ = w.Write([]byte(`{"status":"error","message":"failed to encode response"}`)) } Suggestion importance[1-10]: 5 __ Why: Improves robustness by handling encode failures after headers are sent, but behavior changes response semantics and may produce mixed content; useful but non-critical.	Low
General	Safeguard fail counting logic Treat only explicit `Fail` as failures but ensure non-fail statuses like `Warn` do not increment `failCount`. Also add a simple nil check to avoid iteration on a nil map in future refactors. This preserves semantics and avoids unintended panics if `checks` ever becomes a slice. gateway/health_check.go [192-196] -for _, v := range checks { - if v.Status == Fail { - failCount++ +if checks != nil { + for _, v := range checks { + if v.Status == Fail { + failCount++ + } } } Suggestion importance[1-10]: 3 __ Why: While adding a nil check is harmless, iterating a nil map in Go is safe; the change has minimal impact and the current code already counts only explicit `Fail`, so the benefit is minor.	Low

github-actions · 2025-08-11T12:41:35Z

📦 Impact Review Snapshot

Effort	Downstream Updates	Compatibility	Docs	TL;DR
Low	✅	🟢	📖	Reverting "/hello" endpoint to always return 200 OK is backward compatible but monitoring systems may need adjustment

## Impact Assessment

This PR reverts changes to the "/hello" health check endpoint, making it always return HTTP 200 OK regardless of the actual health status. The health status (Pass/Warn/Fail) is still computed and included in the response body, but the HTTP status code is always 200. This change affects how monitoring systems might interpret the health of the Tyk Gateway, as they would need to inspect the response body rather than relying on HTTP status codes.

The readiness endpoint ("/ready") behavior remains unchanged - it will still return non-200 status codes when the gateway is not ready to serve requests.

## Required Updates

tyk-charts: Health check probes in Kubernetes deployments may need to be updated if they were relying on non-200 status codes from the "/hello" endpoint. The liveness probe should continue to use the "/hello" endpoint, but readiness probes should use the "/ready" endpoint if they need to detect actual readiness state.
tyk-operator: If the operator uses the "/hello" endpoint for health checking, it should be updated to either parse the response body for the actual status or switch to using the "/ready" endpoint for determining if the gateway is truly ready.
Monitoring systems: Any external monitoring systems that check Tyk Gateway health should be updated to parse the JSON response body rather than relying solely on HTTP status codes.

## Compatibility Concerns

This change is generally backward compatible since:

The "/hello" endpoint will continue to be available at the same path
The response format remains the same, only the HTTP status code behavior changes
The actual health status is still available in the response body

However, systems that rely specifically on non-200 status codes from the "/hello" endpoint to detect unhealthy states will need to be updated. This includes Kubernetes liveness/readiness probes, monitoring systems, and any custom health check implementations.

## Summary & Recommendations

Update documentation to clearly explain the difference between the "/hello" endpoint (always returns 200 OK) and the "/ready" endpoint (returns non-200 when not ready)
Ensure Kubernetes deployments in tyk-charts use appropriate probes: liveness probe for "/hello" and readiness probe for "/ready"
Consider adding a note in release notes about this behavior change to alert users who might be relying on HTTP status codes from the "/hello" endpoint

Tip: Mention me again using /dependency <request>.
Powered by Probe AI
Tyk Gateway Dependency Impact Reviewer

github-actions · 2025-08-11T12:41:42Z

🚦 Connectivity Review Snapshot

Effort	Tests	Security	Perf	TL;DR
Low	✅	🔒 none	🟢	Restores /hello endpoint to always return 200 OK while maintaining status in response body

## Connectivity Assessment

Redis Connections: No changes to Redis connectivity; the PR only modifies how health check results are presented.
RPC Connections: No changes to RPC connectivity; the health check still monitors RPC connections but doesn't affect their behavior.
Synchronization Mechanisms: The PR maintains the same health check data collection but changes how the HTTP status code is determined for the liveness endpoint.

## Test Coverage Validation

Redis Tests: Excellent test coverage for Redis connectivity failures, verifying that /hello returns 200 OK with appropriate status in the body.
RPC Tests: Good coverage for RPC failure scenarios, confirming that liveness checks maintain 200 OK responses.
Failure Scenario Tests: Comprehensive test suite added that covers various failure combinations and edge cases, including comparison between liveness and readiness endpoints.

## Security & Performance Impact

Authentication Changes: No authentication changes in this PR.
Performance Considerations: Minimal performance impact; the change simplifies the status code determination logic.
Error Handling: Improved error handling for JSON encoding in the readiness handler, which logs warnings when encoding fails.

## Summary & Recommendations

This PR correctly reverts the behavior of the /hello endpoint to always return HTTP 200 OK, which is critical for Kubernetes liveness probes to prevent unnecessary pod restarts.
The change maintains the health status information in the response body, allowing monitoring systems to still detect issues.
The distinction between liveness (/hello) and readiness (/ready) endpoints is now clearer, with readiness still returning 503 for critical failures.
No suggestions to provide – change LGTM.

Tip: Mention me again using /connectivity <request>.
Powered by Probe AI
Connectivity Issues Reviewer Prompt for Tyk Gateway

github-actions · 2025-08-11T12:42:32Z

🛡️ Security Snapshot

Effort	Risk Level	Tests	Compliance	TL;DR
Low	🟢	✅	✔️	Reverting `/hello` to always return 200 OK has minimal security impact

## Security Impact Analysis

The PR reverts the /hello health check endpoint to always return HTTP 200 OK regardless of the actual health status, while still including the accurate health status (Pass/Warn/Fail) in the response body. This change primarily affects operational monitoring rather than security controls. The readiness endpoint (/ready) maintains its behavior of returning non-200 status codes for unhealthy states, which preserves the ability to detect critical failures.

The addition of error handling for JSON encoding errors in the readiness handler is a positive improvement that prevents silent failures and provides better logging.

## Identified Vulnerabilities

No direct security vulnerabilities were introduced by this change. The modification to always return 200 OK from the liveness endpoint could potentially mask system health issues from monitoring systems that only check HTTP status codes, but this is an operational concern rather than a security vulnerability.

The PR includes comprehensive test coverage that verifies the behavior of both endpoints across various failure scenarios, which mitigates the risk of unintended side effects.

## Security Recommendations

Ensure monitoring systems are updated to check the response body status field rather than relying solely on HTTP status codes when monitoring the /hello endpoint.
Consider adding documentation to clarify the distinction between liveness and readiness endpoints, specifically noting that /hello always returns 200 OK while /ready returns non-200 status codes for unhealthy states.
Verify that any existing health check automation in CI/CD pipelines or deployment systems is updated to accommodate this change.

## OWASP Compliance

This change does not directly impact OWASP Top 10 concerns as it doesn't modify authentication, authorization, input validation, or data protection mechanisms. The health check endpoints are typically internal operational endpoints and not directly exposed to end users.

The improved error handling for JSON encoding errors aligns with OWASP's recommendation for proper error handling and logging.

## Summary

The change to always return 200 OK from the /hello endpoint is an operational change with minimal security impact.
The PR includes comprehensive test coverage that verifies the behavior across various scenarios.
The improved error handling for JSON encoding is a positive security improvement.
No security issues identified – change LGTM.

Tip: Mention me again using /security <request>.
Powered by Probe AI
Security Impact Reviewer Prompt

github-actions · 2025-08-11T12:44:27Z

🚀 Performance Snapshot

Effort	Perf Risk	Hot Paths	Benchmarks	TL;DR
Low	🟢	✅	❔	Reverts `/hello` endpoint to always return 200 OK with status in body

## Performance Impact Analysis

This PR makes a targeted change to the health check endpoint behavior with minimal performance impact. The /hello endpoint now always returns HTTP 200 regardless of health status (putting the actual Pass/Warn/Fail status in the response body), while improving error handling for JSON encoding. This change simplifies health check logic and aligns with Kubernetes best practices for liveness probes.

## Critical Areas

The main change is in the liveCheckHandler function which now:

Always returns HTTP 200 OK status code
Computes Pass/Warn/Fail status based on check counts
Includes better error handling for JSON encoding failures

This affects the /hello endpoint which is typically used for liveness probes in Kubernetes environments. The change makes the endpoint more reliable for basic health checking while keeping detailed status information in the response body.

## Optimization Recommendations

The changes are well-optimized and follow best practices for health check endpoints. The computation of status from check counts is straightforward and efficient. The error handling for JSON encoding is appropriate, logging warnings without affecting the response status.

No further optimizations are needed as this is a targeted fix that improves reliability without adding overhead.

## Summary

This PR reverts the /hello health check endpoint to always return HTTP 200 OK, with the actual health status (Pass/Warn/Fail) included in the response body
The change aligns with Kubernetes best practices where liveness probes should be lightweight and only check if the application is running
Comprehensive tests were added to verify the behavior of both liveness and readiness endpoints
The PR improves error handling for JSON encoding with better logging
No performance concerns - this is a targeted change with minimal impact on system performance

No suggestions to provide – change LGTM.

Tip: Mention me again using /performance <request>.
Powered by Probe AI
Performance Impact Reviewer Prompt

This commit updates the health check test files by introducing a new variable assignment to prevent variable shadowing within the test cases. This change enhances code clarity and maintains the integrity of test execution across multiple test cases.

github-actions · 2025-08-11T13:19:11Z

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

This PR makes important changes to the health check endpoints in Tyk Gateway, specifically reverting the behavior of the /hello liveness endpoint to always return HTTP 200 OK regardless of the actual health status of the gateway.

Key Changes

Liveness Endpoint Behavior: The /hello endpoint now always returns HTTP 200 OK, even when health checks fail. The actual health status (Pass/Warn/Fail) is still included in the response body, but monitoring systems that only check HTTP status codes will always see the gateway as "alive".
Simplified Health Status Calculation: The health status calculation in the liveness handler has been refactored to be based solely on the number of failed checks:
- If no checks fail (failCount == 0), status is Pass
- If all checks fail (failCount == len(checks)), status is Fail
- Otherwise, status is Warn
Error Handling Improvement: Error handling has been added for JSON encoding errors in both the liveness and readiness handlers, ensuring these errors are properly logged.

Rationale

This change aligns with Kubernetes best practices for liveness probes, which should only check if the application is running and responding, not its full health status. A non-200 response from a liveness probe would cause Kubernetes to restart the pod, which might not be the desired behavior for temporary health check failures.

The readiness endpoint (/ready) still returns non-200 status codes when critical components like Redis are unavailable, which is appropriate for readiness checks that determine if the gateway should receive traffic.

Impact

Kubernetes deployments will no longer restart Tyk Gateway pods due to temporary health check failures
Monitoring systems that rely on HTTP status codes will need to be updated to parse the response body for the actual health status
The distinction between liveness and readiness is now clearer, with liveness checking if the gateway is running and readiness checking if it's ready to serve requests

This change improves the stability of Tyk Gateway deployments in Kubernetes environments while still providing detailed health information in the response body.

Tip: Mention me again using /dependency <request>.
Powered by Probe AI
Tyk Gateway Dependency Impact Reviewer

github-actions · 2025-08-11T13:20:02Z

🛡️ Security Snapshot

Effort	Risk Level	Tests	Compliance	TL;DR
Low	🟢	✅	✔️	Reverting `/hello` endpoint to always return 200 OK has minimal security impact

## Security Impact Analysis

The PR reverts the /hello health check endpoint to always respond with HTTP 200 OK regardless of internal health check status, while still including the actual health status (Pass/Warn/Fail) in the response body. This change primarily affects operational monitoring rather than security controls. The readiness endpoint (/ready) maintains its behavior of returning non-200 status codes for unhealthy states, which preserves the ability to detect critical failures.

The addition of error handling for JSON encoding errors in the readiness handler is a positive improvement that prevents silent failures and provides better logging.

## Identified Vulnerabilities

No direct security vulnerabilities were introduced by this change. The modification to always return 200 OK from the liveness endpoint could potentially mask system health issues from monitoring systems that only check HTTP status codes, but this is an operational concern rather than a security vulnerability.

The PR includes comprehensive test coverage that verifies the behavior of both endpoints across various failure scenarios, which mitigates the risk of unintended side effects.

## Security Recommendations

Ensure monitoring systems are updated to check the response body status field rather than relying solely on HTTP status codes when monitoring the /hello endpoint.
Consider adding documentation to clarify the distinction between liveness and readiness endpoints, specifically noting that /hello always returns 200 OK while /ready returns non-200 status codes for unhealthy states.
Verify that any existing health check automation in CI/CD pipelines or deployment systems is updated to accommodate this change.

## OWASP Compliance

This change does not directly impact OWASP Top 10 concerns as it doesn't modify authentication, authorization, input validation, or data protection mechanisms. The health check endpoints are typically internal operational endpoints and not directly exposed to end users.

The improved error handling for JSON encoding errors aligns with OWASP's recommendation for proper error handling and logging.

## Summary

The change to always return 200 OK from the /hello endpoint is an operational change with minimal security impact.
The PR includes comprehensive test coverage that verifies the behavior across various scenarios.
The improved error handling for JSON encoding is a positive security improvement.
No security issues identified – change LGTM.

Tip: Mention me again using /security <request>.
Powered by Probe AI
Security Impact Reviewer Prompt

github-actions · 2025-08-11T13:20:50Z

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

This PR makes a targeted change to the Tyk Gateway's health check system, specifically reverting the behavior of the /hello endpoint (liveness check) to always return HTTP 200 OK regardless of the actual health status of the system components.

Key Changes

Liveness Endpoint (/hello) Changes:
- Now always returns HTTP 200 OK status code regardless of health check results
- Still includes the actual health status (Pass/Warn/Fail) in the response body
- Health status calculation is simplified to be based solely on the number of failed checks
- Previous "critical failure" logic is removed from the liveness handler
Readiness Endpoint (/ready) Improvements:
- Error handling is added for JSON encoding errors, which are now properly logged
- Maintains existing behavior of returning non-200 status codes when the gateway is not ready
Test Coverage:
- Comprehensive test suite added for the liveness handler
- Tests comparing liveness vs readiness behavior
- Edge case coverage for various failure scenarios

Rationale

This change follows Kubernetes best practices for liveness probes. The liveness probe should only check if the application is running, not its health status. By always returning 200 OK, it prevents unnecessary pod restarts when there are non-critical failures. The actual health status is still available in the response body for monitoring systems.

The readiness probe (/ready) continues to return non-200 status codes for critical failures, which is appropriate for determining if the gateway should receive traffic.

This is a well-structured change with excellent test coverage that properly separates the concerns of liveness (is the application running?) from readiness (is the application ready to serve requests?).

Tip: Mention me again using /connectivity <request>.
Powered by Probe AI
Connectivity Issues Reviewer Prompt for Tyk Gateway

github-actions · 2025-08-11T13:21:36Z

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

This PR makes targeted changes to the health check endpoints in the Tyk Gateway, focusing on three main areas:

1. Liveness Endpoint Behavior

The PR ensures that the /hello (liveness) endpoint always returns HTTP 200 OK, regardless of internal health check status. Looking at the current code, the liveCheckHandler function already returns HTTP 200 OK, so this change is likely reinforcing this behavior to prevent any future changes from altering it.

2. Health Status Calculation Simplification

The health status calculation in the liveness handler is refactored and simplified to be based solely on the number of failed checks (failCount). The PR removes the dependency on the determineHealthStatus function which takes into account "critical failures" through the isCriticalFailure function.

The current implementation already calculates a status based on the number of failed checks:

switch failCount {
case 0:
    status = Pass
case len(checks):
    status = Fail
default:
    status = Warn
}

This approach is simpler and more predictable than the previous logic that considered critical failures separately.

3. Error Handling for JSON Encoding

The PR adds error handling for JSON encoding errors in the readiness handler. Looking at the current code, both handlers already have error handling for JSON encoding:

err := json.NewEncoder(w).Encode(res)
if err != nil {
    mainLog.Warning(fmt.Sprintf("[Readiness] Could not encode response, error: %s", err.Error()))
}

4. Comprehensive Test Coverage

The PR adds extensive tests for the health check endpoints (+552 lines in health_check_test.go), including:

Tests for /hello status and response body
Comparison tests between liveness and readiness behavior
Edge case coverage and header checks
JSON payload field validation and error handling

Impact and Benefits

This change improves the reliability of Kubernetes deployments by ensuring the liveness probe always returns 200 OK, preventing unnecessary pod restarts due to temporary issues that don't affect the gateway's ability to process requests.

The distinction between liveness (/hello) and readiness (/ready) endpoints is now clearer:

Liveness: Always returns 200 OK with health status in the body
Readiness: Returns non-200 status codes when the gateway is not ready to serve requests

This aligns with Kubernetes best practices where liveness probes should be lightweight and only check if the application is running, while readiness probes should determine if the application is ready to serve traffic.

Tip: Mention me again using /performance <request>.
Powered by Probe AI
Performance Impact Reviewer Prompt

sonarqubecloud · 2025-08-11T13:27:01Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
95.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

ilijabojanovic · 2025-08-11T14:27:07Z

/release to relesae-5.8.4

tykbot · 2025-08-11T14:27:25Z

@ilijabojanovic Release branch not found

ilijabojanovic · 2025-08-11T14:38:24Z

/release to release-5.8.4

ilijabojanovic · 2025-08-11T14:38:32Z

/release to release-5.9.1

tykbot · 2025-08-11T14:38:41Z

Working on it! Note that it can take a few minutes.

tykbot · 2025-08-11T14:38:48Z

Working on it! Note that it can take a few minutes.

### **User description** - The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status. - Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic. - Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged. <details open> <summary><a href="https://tyktech.atlassian.net/browse/TT-15507" title="TT-15507" target="_blank">TT-15507</a></summary> <br /> <table> <tr> <th>Summary</th> <td>Revert changes to the "/hello" health check endpoint</td> </tr> <tr> <th>Type</th> <td> <img alt="Bug" src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium" /> Bug </td> </tr> <tr> <th>Status</th> <td>In Dev</td> </tr> <tr> <th>Points</th> <td>N/A</td> </tr> <tr> <th>Labels</th> <td>-</td> </tr> </table> </details>  ---  ## Description  ## Related Issue     ## Motivation and Context  ## How This Has Been Tested     ## Screenshots (if appropriate) ## Types of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality) ## Checklist    - [ ] I ensured that the documentation is up to date - [ ] I explained why this PR updates go.mod in detail with reasoning why it's required - [ ] I would like a code coverage CI quality gate exception and have explained why ___ ### **PR Type** Bug fix, Tests ___ ### **Description** Restore /hello to always return 200 Compute Pass/Warn/Fail from check counts Add extensive liveness vs readiness tests Improve JSON encode error handling logs ___ ### Diagram Walkthrough ```mermaid flowchart LR liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"] liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"] readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"] tests["health_check_test.go"] -- "cover scenarios" --> liveH tests -- "compare vs readiness" --> readyH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check.go</strong><dd><code>Rework liveness logic and enforce 200 response</code>                      </dd></summary> <hr> gateway/health_check.go <ul><li>Compute failCount inline and set status.<br> <li> Always write HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a>    </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check_test.go</strong><dd><code>Add comprehensive liveness and readiness tests</code>                      </dd></summary> <hr> gateway/health_check_test.go <ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs readiness behavior tests.<br> <li> Add edge case coverage and headers checks.<br> <li> Validate JSON payload fields and errors.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a>  </td> </tr> </table></td></tr></tr></tbody></table> </details> ___ (cherry picked from commit 8cd87da)

tykbot · 2025-08-11T14:39:03Z

@ilijabojanovic Created merge PRs

### **User description** - The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status. - Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic. - Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged. <details open> <summary><a href="https://tyktech.atlassian.net/browse/TT-15507" title="TT-15507" target="_blank">TT-15507</a></summary> <br /> <table> <tr> <th>Summary</th> <td>Revert changes to the "/hello" health check endpoint</td> </tr> <tr> <th>Type</th> <td> <img alt="Bug" src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium" /> Bug </td> </tr> <tr> <th>Status</th> <td>In Dev</td> </tr> <tr> <th>Points</th> <td>N/A</td> </tr> <tr> <th>Labels</th> <td>-</td> </tr> </table> </details>  ---  ## Description  ## Related Issue     ## Motivation and Context  ## How This Has Been Tested     ## Screenshots (if appropriate) ## Types of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality) ## Checklist    - [ ] I ensured that the documentation is up to date - [ ] I explained why this PR updates go.mod in detail with reasoning why it's required - [ ] I would like a code coverage CI quality gate exception and have explained why ___ ### **PR Type** Bug fix, Tests ___ ### **Description** Restore /hello to always return 200 Compute Pass/Warn/Fail from check counts Add extensive liveness vs readiness tests Improve JSON encode error handling logs ___ ### Diagram Walkthrough ```mermaid flowchart LR liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"] liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"] readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"] tests["health_check_test.go"] -- "cover scenarios" --> liveH tests -- "compare vs readiness" --> readyH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check.go</strong><dd><code>Rework liveness logic and enforce 200 response</code>                      </dd></summary> <hr> gateway/health_check.go <ul><li>Compute failCount inline and set status.<br> <li> Always write HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a>    </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check_test.go</strong><dd><code>Add comprehensive liveness and readiness tests</code>                      </dd></summary> <hr> gateway/health_check_test.go <ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs readiness behavior tests.<br> <li> Add edge case coverage and headers checks.<br> <li> Validate JSON payload fields and errors.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a>  </td> </tr> </table></td></tr></tr></tbody></table> </details> ___ (cherry picked from commit 8cd87da)

…ealth check endpoint (#7295) [TT-15507] Revert changes to the "/hello" health check endpoint (#7295) ### **User description** - The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status. - Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic. - Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged. <details open> <summary><a href="https://tyktech.atlassian.net/browse/TT-15507" title="TT-15507" target="_blank">TT-15507</a></summary> <br /> <table> <tr> <th>Summary</th> <td>Revert changes to the "/hello" health check endpoint</td> </tr> <tr> <th>Type</th> <td> <img alt="Bug" src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium" /> Bug </td> </tr> <tr> <th>Status</th> <td>In Dev</td> </tr> <tr> <th>Points</th> <td>N/A</td> </tr> <tr> <th>Labels</th> <td>-</td> </tr> </table> </details>  ---  ## Description  ## Related Issue     ## Motivation and Context  ## How This Has Been Tested     ## Screenshots (if appropriate) ## Types of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality) ## Checklist    - [ ] I ensured that the documentation is up to date - [ ] I explained why this PR updates go.mod in detail with reasoning why it's required - [ ] I would like a code coverage CI quality gate exception and have explained why ___ ### **PR Type** Bug fix, Tests ___ ### **Description** Restore /hello to always return 200 Compute Pass/Warn/Fail from check counts Add extensive liveness vs readiness tests Improve JSON encode error handling logs ___ ### Diagram Walkthrough ```mermaid flowchart LR liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"] liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"] readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"] tests["health_check_test.go"] -- "cover scenarios" --> liveH tests -- "compare vs readiness" --> readyH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check.go</strong><dd><code>Rework liveness logic and enforce 200 response</code>                      </dd></summary> <hr> gateway/health_check.go <ul><li>Compute failCount inline and set status.<br> <li> Always write HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a>    </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check_test.go</strong><dd><code>Add comprehensive liveness and readiness tests</code>                      </dd></summary> <hr> gateway/health_check_test.go <ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs readiness behavior tests.<br> <li> Add edge case coverage and headers checks.<br> <li> Validate JSON payload fields and errors.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a>  </td> </tr> </table></td></tr></tr></tbody></table> </details> ___

tykbot · 2025-08-11T14:39:24Z

@ilijabojanovic Created merge PRs

mativm02 added 4 commits August 7, 2025 10:40

Reverting /hello endpoint to always return 200 status code

8e6d8d0

linting

3c572f1

Merge branch 'master' of github.com:TykTechnologies/tyk into TT-15507

a3c93a2

Add live check handler tests for various health check scenarios

e04518f

buger added bug Tyk Product Development labels Aug 11, 2025

github-actions bot added the Review effort 2/5 label Aug 11, 2025

pvormste approved these changes Aug 11, 2025

View reviewed changes

ilijabojanovic merged commit 8cd87da into master Aug 11, 2025
42 of 44 checks passed

ilijabojanovic deleted the TT-15507 branch August 11, 2025 14:26

buger mentioned this pull request Aug 11, 2025

Merging to release-5.8.4: [TT-15507] Revert changes to the "/hello" health check endpoint (#7295) #7296

Closed

7 tasks

buger mentioned this pull request Aug 11, 2025

Merging to release-5.9.1: [TT-15507] Revert changes to the "/hello" health check endpoint (#7295) #7297

Merged

7 tasks

[TT-15507] Revert changes to the "/hello" health check endpoint #7295

[TT-15507] Revert changes to the "/hello" health check endpoint #7295

Uh oh!

Conversation

mativm02 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

Checklist

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

buger commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

PR Code Suggestions ✨

Uh oh!

github-actions bot commented Aug 11, 2025

📦 Impact Review Snapshot

Uh oh!

github-actions bot commented Aug 11, 2025

🚦 Connectivity Review Snapshot

Uh oh!

github-actions bot commented Aug 11, 2025

🛡️ Security Snapshot

Uh oh!

github-actions bot commented Aug 11, 2025

🚀 Performance Snapshot

Uh oh!

github-actions bot commented Aug 11, 2025

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

Key Changes

Rationale

Impact

Uh oh!

github-actions bot commented Aug 11, 2025

🛡️ Security Snapshot

Uh oh!

github-actions bot commented Aug 11, 2025

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

Key Changes

Rationale

Uh oh!

github-actions bot commented Aug 11, 2025

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

1. Liveness Endpoint Behavior

2. Health Status Calculation Simplification

3. Error Handling for JSON Encoding

4. Comprehensive Test Coverage

Impact and Benefits

Uh oh!

sonarqubecloud bot commented Aug 11, 2025

Quality Gate passed

Uh oh!

Uh oh!

ilijabojanovic commented Aug 11, 2025

Uh oh!

tykbot bot commented Aug 11, 2025

Uh oh!

ilijabojanovic commented Aug 11, 2025

Uh oh!

ilijabojanovic commented Aug 11, 2025

Uh oh!

tykbot bot commented Aug 11, 2025

Uh oh!

tykbot bot commented Aug 11, 2025

Uh oh!

tykbot bot commented Aug 11, 2025

mativm02 commented Aug 11, 2025 •

edited

Loading