Skip to content

[TT-15507] Revert changes to the "/hello" health check endpoint #7295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 11, 2025

Conversation

mativm02
Copy link
Contributor

@mativm02 mativm02 commented Aug 11, 2025

User description

  • The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status.
  • Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic.
  • Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged.
TT-15507
Summary Revert changes to the "/hello" health check endpoint
Type Bug Bug
Status In Dev
Points N/A
Labels -

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

PR Type

Bug fix, Tests


Description

Restore /hello to always return 200
Compute Pass/Warn/Fail from check counts
Add extensive liveness vs readiness tests
Improve JSON encode error handling logs


Diagram Walkthrough

flowchart LR
  liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"]
  liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"]
  readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"]
  tests["health_check_test.go"] -- "cover scenarios" --> liveH
  tests -- "compare vs readiness" --> readyH
Loading

File Walkthrough

Relevant files
Bug fix
health_check.go
Rework liveness logic and enforce 200 response                     

gateway/health_check.go

  • Compute failCount inline and set status.
  • Always write HTTP 200 for /hello.
  • Add error handling for JSON encode in readiness.
  • Remove determineHealthStatus usage in liveness.
+24/-4   
Tests
health_check_test.go
Add comprehensive liveness and readiness tests                     

gateway/health_check_test.go

  • Add tests for /hello status and body.
  • Add liveness vs readiness behavior tests.
  • Add edge case coverage and headers checks.
  • Validate JSON payload fields and errors.
+552/-0 

@buger
Copy link
Member

buger commented Aug 11, 2025

I'm a bot and I 👍 this PR title. 🤖

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The liveness status computation only counts items with status equal to Fail. If checks can be nil or empty, ensure len(checks) behaves as expected and that statuses like Warn should indeed be ignored for fail counting. This changes behavior from prior logic; verify this matches intended revert.

for _, v := range checks {
	if v.Status == Fail {
		failCount++
	}
}

var status HealthCheckStatus

switch failCount {
case 0:
	status = Pass

case len(checks):
	status = Fail

default:
	status = Warn
}
HTTP Status Change

liveCheckHandler now always writes http.StatusOK instead of the previously computed httpStatus. Confirm no external consumers rely on non-200 codes for liveness, and that monitoring/alerts are adjusted accordingly.

w.WriteHeader(http.StatusOK)
err := json.NewEncoder(w).Encode(res)
Error Handling

Readiness handler now logs JSON encoding errors but still returns 200 after calling WriteHeader(http.StatusOK). Consider whether a failed encode should result in a non-200 or at least avoid writing partial/invalid bodies.

	w.WriteHeader(http.StatusOK)
	err := json.NewEncoder(w).Encode(res)
	if err != nil {
		mainLog.Warning(fmt.Sprintf("[Readiness] Could not encode response, error: %s", err.Error()))
	}
}

Copy link
Contributor

API Changes

no api changes detected

Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle empty checks map

When checks is nil, len(checks) is 0 and the case len(checks) branch will match for
failCount == 0, incorrectly returning Fail. Guard the zero-checks scenario
explicitly before the switch. This ensures empty or nil health data returns Pass as
intended by tests.

gateway/health_check.go [200-209]

-switch failCount {
-case 0:
+if len(checks) == 0 {
 	status = Pass
-
-case len(checks):
-	status = Fail
-
-default:
-	status = Warn
+} else {
+	switch failCount {
+	case 0:
+		status = Pass
+	case len(checks):
+		status = Fail
+	default:
+		status = Warn
+	}
 }
Suggestion importance[1-10]: 9

__

Why: Correctly identifies that when checks is nil/empty, len(checks)==0 makes case len(checks) match and incorrectly set status to Fail; adding an explicit empty-check prevents this and aligns with tests expecting Pass.

High
General
Recover from JSON encode failure

If JSON encoding fails after writing the status, the handler silently logs but
leaves a partial body. To prevent clients from caching a bad body, clear the header
Content-Length and write a minimal error message. This improves robustness of the
endpoint response.

gateway/health_check.go [220-224]

 w.WriteHeader(http.StatusOK)
-err := json.NewEncoder(w).Encode(res)
-if err != nil {
+if err := json.NewEncoder(w).Encode(res); err != nil {
 	mainLog.Warning(fmt.Sprintf("[Liveness] Could not encode response, error: %s", err.Error()))
+	w.Header().Del("Content-Length")
+	_, _ = w.Write([]byte(`{"status":"error","message":"failed to encode response"}`))
 }
Suggestion importance[1-10]: 5

__

Why: Improves robustness by handling encode failures after headers are sent, but behavior changes response semantics and may produce mixed content; useful but non-critical.

Low
Safeguard fail counting logic

Treat only explicit Fail as failures but ensure non-fail statuses like Warn do not
increment failCount. Also add a simple nil check to avoid iteration on a nil map in
future refactors. This preserves semantics and avoids unintended panics if checks
ever becomes a slice.

gateway/health_check.go [192-196]

-for _, v := range checks {
-	if v.Status == Fail {
-		failCount++
+if checks != nil {
+	for _, v := range checks {
+		if v.Status == Fail {
+			failCount++
+		}
 	}
 }
Suggestion importance[1-10]: 3

__

Why: While adding a nil check is harmless, iterating a nil map in Go is safe; the change has minimal impact and the current code already counts only explicit Fail, so the benefit is minor.

Low

Copy link
Contributor

📦 Impact Review Snapshot

Effort Downstream Updates Compatibility Docs TL;DR
Low 🟢 📖 Reverting "/hello" endpoint to always return 200 OK is backward compatible but monitoring systems may need adjustment
## Impact Assessment

This PR reverts changes to the "/hello" health check endpoint, making it always return HTTP 200 OK regardless of the actual health status. The health status (Pass/Warn/Fail) is still computed and included in the response body, but the HTTP status code is always 200. This change affects how monitoring systems might interpret the health of the Tyk Gateway, as they would need to inspect the response body rather than relying on HTTP status codes.

The readiness endpoint ("/ready") behavior remains unchanged - it will still return non-200 status codes when the gateway is not ready to serve requests.

## Required Updates
  1. tyk-charts: Health check probes in Kubernetes deployments may need to be updated if they were relying on non-200 status codes from the "/hello" endpoint. The liveness probe should continue to use the "/hello" endpoint, but readiness probes should use the "/ready" endpoint if they need to detect actual readiness state.

  2. tyk-operator: If the operator uses the "/hello" endpoint for health checking, it should be updated to either parse the response body for the actual status or switch to using the "/ready" endpoint for determining if the gateway is truly ready.

  3. Monitoring systems: Any external monitoring systems that check Tyk Gateway health should be updated to parse the JSON response body rather than relying solely on HTTP status codes.

## Compatibility Concerns

This change is generally backward compatible since:

  1. The "/hello" endpoint will continue to be available at the same path
  2. The response format remains the same, only the HTTP status code behavior changes
  3. The actual health status is still available in the response body

However, systems that rely specifically on non-200 status codes from the "/hello" endpoint to detect unhealthy states will need to be updated. This includes Kubernetes liveness/readiness probes, monitoring systems, and any custom health check implementations.

## Summary & Recommendations
  • Update documentation to clearly explain the difference between the "/hello" endpoint (always returns 200 OK) and the "/ready" endpoint (returns non-200 when not ready)
  • Ensure Kubernetes deployments in tyk-charts use appropriate probes: liveness probe for "/hello" and readiness probe for "/ready"
  • Consider adding a note in release notes about this behavior change to alert users who might be relying on HTTP status codes from the "/hello" endpoint

Tip: Mention me again using /dependency <request>.
Powered by Probe AI
Tyk Gateway Dependency Impact Reviewer

Copy link
Contributor

🚦 Connectivity Review Snapshot

Effort Tests Security Perf TL;DR
Low 🔒 none 🟢 Restores /hello endpoint to always return 200 OK while maintaining status in response body
## Connectivity Assessment
  • Redis Connections: No changes to Redis connectivity; the PR only modifies how health check results are presented.
  • RPC Connections: No changes to RPC connectivity; the health check still monitors RPC connections but doesn't affect their behavior.
  • Synchronization Mechanisms: The PR maintains the same health check data collection but changes how the HTTP status code is determined for the liveness endpoint.
## Test Coverage Validation
  • Redis Tests: Excellent test coverage for Redis connectivity failures, verifying that /hello returns 200 OK with appropriate status in the body.
  • RPC Tests: Good coverage for RPC failure scenarios, confirming that liveness checks maintain 200 OK responses.
  • Failure Scenario Tests: Comprehensive test suite added that covers various failure combinations and edge cases, including comparison between liveness and readiness endpoints.
## Security & Performance Impact
  • Authentication Changes: No authentication changes in this PR.
  • Performance Considerations: Minimal performance impact; the change simplifies the status code determination logic.
  • Error Handling: Improved error handling for JSON encoding in the readiness handler, which logs warnings when encoding fails.
## Summary & Recommendations
  • This PR correctly reverts the behavior of the /hello endpoint to always return HTTP 200 OK, which is critical for Kubernetes liveness probes to prevent unnecessary pod restarts.
  • The change maintains the health status information in the response body, allowing monitoring systems to still detect issues.
  • The distinction between liveness (/hello) and readiness (/ready) endpoints is now clearer, with readiness still returning 503 for critical failures.
  • No suggestions to provide – change LGTM.

Tip: Mention me again using /connectivity <request>.
Powered by Probe AI
Connectivity Issues Reviewer Prompt for Tyk Gateway

Copy link
Contributor

🛡️ Security Snapshot

Effort Risk Level Tests Compliance TL;DR
Low 🟢 ✔️ Reverting /hello to always return 200 OK has minimal security impact
## Security Impact Analysis

The PR reverts the /hello health check endpoint to always return HTTP 200 OK regardless of the actual health status, while still including the accurate health status (Pass/Warn/Fail) in the response body. This change primarily affects operational monitoring rather than security controls. The readiness endpoint (/ready) maintains its behavior of returning non-200 status codes for unhealthy states, which preserves the ability to detect critical failures.

The addition of error handling for JSON encoding errors in the readiness handler is a positive improvement that prevents silent failures and provides better logging.

## Identified Vulnerabilities

No direct security vulnerabilities were introduced by this change. The modification to always return 200 OK from the liveness endpoint could potentially mask system health issues from monitoring systems that only check HTTP status codes, but this is an operational concern rather than a security vulnerability.

The PR includes comprehensive test coverage that verifies the behavior of both endpoints across various failure scenarios, which mitigates the risk of unintended side effects.

## Security Recommendations
  1. Ensure monitoring systems are updated to check the response body status field rather than relying solely on HTTP status codes when monitoring the /hello endpoint.

  2. Consider adding documentation to clarify the distinction between liveness and readiness endpoints, specifically noting that /hello always returns 200 OK while /ready returns non-200 status codes for unhealthy states.

  3. Verify that any existing health check automation in CI/CD pipelines or deployment systems is updated to accommodate this change.

## OWASP Compliance

This change does not directly impact OWASP Top 10 concerns as it doesn't modify authentication, authorization, input validation, or data protection mechanisms. The health check endpoints are typically internal operational endpoints and not directly exposed to end users.

The improved error handling for JSON encoding errors aligns with OWASP's recommendation for proper error handling and logging.

## Summary
  • The change to always return 200 OK from the /hello endpoint is an operational change with minimal security impact.
  • The PR includes comprehensive test coverage that verifies the behavior across various scenarios.
  • The improved error handling for JSON encoding is a positive security improvement.
  • No security issues identified – change LGTM.

Tip: Mention me again using /security <request>.
Powered by Probe AI
Security Impact Reviewer Prompt

Copy link
Contributor

🚀 Performance Snapshot

Effort Perf Risk Hot Paths Benchmarks TL;DR
Low 🟢 Reverts /hello endpoint to always return 200 OK with status in body
## Performance Impact Analysis

This PR makes a targeted change to the health check endpoint behavior with minimal performance impact. The /hello endpoint now always returns HTTP 200 regardless of health status (putting the actual Pass/Warn/Fail status in the response body), while improving error handling for JSON encoding. This change simplifies health check logic and aligns with Kubernetes best practices for liveness probes.

## Critical Areas

The main change is in the liveCheckHandler function which now:

  1. Always returns HTTP 200 OK status code
  2. Computes Pass/Warn/Fail status based on check counts
  3. Includes better error handling for JSON encoding failures

This affects the /hello endpoint which is typically used for liveness probes in Kubernetes environments. The change makes the endpoint more reliable for basic health checking while keeping detailed status information in the response body.

## Optimization Recommendations

The changes are well-optimized and follow best practices for health check endpoints. The computation of status from check counts is straightforward and efficient. The error handling for JSON encoding is appropriate, logging warnings without affecting the response status.

No further optimizations are needed as this is a targeted fix that improves reliability without adding overhead.

## Summary
  • This PR reverts the /hello health check endpoint to always return HTTP 200 OK, with the actual health status (Pass/Warn/Fail) included in the response body
  • The change aligns with Kubernetes best practices where liveness probes should be lightweight and only check if the application is running
  • Comprehensive tests were added to verify the behavior of both liveness and readiness endpoints
  • The PR improves error handling for JSON encoding with better logging
  • No performance concerns - this is a targeted change with minimal impact on system performance

No suggestions to provide – change LGTM.


Tip: Mention me again using /performance <request>.
Powered by Probe AI
Performance Impact Reviewer Prompt

This commit updates the health check test files by introducing a new variable assignment to prevent variable shadowing within the test cases. This change enhances code clarity and maintains the integrity of test execution across multiple test cases.
Copy link
Contributor

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

This PR makes important changes to the health check endpoints in Tyk Gateway, specifically reverting the behavior of the /hello liveness endpoint to always return HTTP 200 OK regardless of the actual health status of the gateway.

Key Changes

  1. Liveness Endpoint Behavior: The /hello endpoint now always returns HTTP 200 OK, even when health checks fail. The actual health status (Pass/Warn/Fail) is still included in the response body, but monitoring systems that only check HTTP status codes will always see the gateway as "alive".

  2. Simplified Health Status Calculation: The health status calculation in the liveness handler has been refactored to be based solely on the number of failed checks:

    • If no checks fail (failCount == 0), status is Pass
    • If all checks fail (failCount == len(checks)), status is Fail
    • Otherwise, status is Warn
  3. Error Handling Improvement: Error handling has been added for JSON encoding errors in both the liveness and readiness handlers, ensuring these errors are properly logged.

Rationale

This change aligns with Kubernetes best practices for liveness probes, which should only check if the application is running and responding, not its full health status. A non-200 response from a liveness probe would cause Kubernetes to restart the pod, which might not be the desired behavior for temporary health check failures.

The readiness endpoint (/ready) still returns non-200 status codes when critical components like Redis are unavailable, which is appropriate for readiness checks that determine if the gateway should receive traffic.

Impact

  • Kubernetes deployments will no longer restart Tyk Gateway pods due to temporary health check failures
  • Monitoring systems that rely on HTTP status codes will need to be updated to parse the response body for the actual health status
  • The distinction between liveness and readiness is now clearer, with liveness checking if the gateway is running and readiness checking if it's ready to serve requests

This change improves the stability of Tyk Gateway deployments in Kubernetes environments while still providing detailed health information in the response body.


Tip: Mention me again using /dependency <request>.
Powered by Probe AI
Tyk Gateway Dependency Impact Reviewer

Copy link
Contributor

🛡️ Security Snapshot

Effort Risk Level Tests Compliance TL;DR
Low 🟢 ✔️ Reverting /hello endpoint to always return 200 OK has minimal security impact
## Security Impact Analysis

The PR reverts the /hello health check endpoint to always respond with HTTP 200 OK regardless of internal health check status, while still including the actual health status (Pass/Warn/Fail) in the response body. This change primarily affects operational monitoring rather than security controls. The readiness endpoint (/ready) maintains its behavior of returning non-200 status codes for unhealthy states, which preserves the ability to detect critical failures.

The addition of error handling for JSON encoding errors in the readiness handler is a positive improvement that prevents silent failures and provides better logging.

## Identified Vulnerabilities

No direct security vulnerabilities were introduced by this change. The modification to always return 200 OK from the liveness endpoint could potentially mask system health issues from monitoring systems that only check HTTP status codes, but this is an operational concern rather than a security vulnerability.

The PR includes comprehensive test coverage that verifies the behavior of both endpoints across various failure scenarios, which mitigates the risk of unintended side effects.

## Security Recommendations
  1. Ensure monitoring systems are updated to check the response body status field rather than relying solely on HTTP status codes when monitoring the /hello endpoint.

  2. Consider adding documentation to clarify the distinction between liveness and readiness endpoints, specifically noting that /hello always returns 200 OK while /ready returns non-200 status codes for unhealthy states.

  3. Verify that any existing health check automation in CI/CD pipelines or deployment systems is updated to accommodate this change.

## OWASP Compliance

This change does not directly impact OWASP Top 10 concerns as it doesn't modify authentication, authorization, input validation, or data protection mechanisms. The health check endpoints are typically internal operational endpoints and not directly exposed to end users.

The improved error handling for JSON encoding errors aligns with OWASP's recommendation for proper error handling and logging.

## Summary
  • The change to always return 200 OK from the /hello endpoint is an operational change with minimal security impact.
  • The PR includes comprehensive test coverage that verifies the behavior across various scenarios.
  • The improved error handling for JSON encoding is a positive security improvement.
  • No security issues identified – change LGTM.

Tip: Mention me again using /security <request>.
Powered by Probe AI
Security Impact Reviewer Prompt

Copy link
Contributor

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

This PR makes a targeted change to the Tyk Gateway's health check system, specifically reverting the behavior of the /hello endpoint (liveness check) to always return HTTP 200 OK regardless of the actual health status of the system components.

Key Changes

  1. Liveness Endpoint (/hello) Changes:

    • Now always returns HTTP 200 OK status code regardless of health check results
    • Still includes the actual health status (Pass/Warn/Fail) in the response body
    • Health status calculation is simplified to be based solely on the number of failed checks
    • Previous "critical failure" logic is removed from the liveness handler
  2. Readiness Endpoint (/ready) Improvements:

    • Error handling is added for JSON encoding errors, which are now properly logged
    • Maintains existing behavior of returning non-200 status codes when the gateway is not ready
  3. Test Coverage:

    • Comprehensive test suite added for the liveness handler
    • Tests comparing liveness vs readiness behavior
    • Edge case coverage for various failure scenarios

Rationale

This change follows Kubernetes best practices for liveness probes. The liveness probe should only check if the application is running, not its health status. By always returning 200 OK, it prevents unnecessary pod restarts when there are non-critical failures. The actual health status is still available in the response body for monitoring systems.

The readiness probe (/ready) continues to return non-200 status codes for critical failures, which is appropriate for determining if the gateway should receive traffic.

This is a well-structured change with excellent test coverage that properly separates the concerns of liveness (is the application running?) from readiness (is the application ready to serve requests?).


Tip: Mention me again using /connectivity <request>.
Powered by Probe AI
Connectivity Issues Reviewer Prompt for Tyk Gateway

Copy link
Contributor

Analysis of PR #7295: Revert changes to the "/hello" health check endpoint

This PR makes targeted changes to the health check endpoints in the Tyk Gateway, focusing on three main areas:

1. Liveness Endpoint Behavior

The PR ensures that the /hello (liveness) endpoint always returns HTTP 200 OK, regardless of internal health check status. Looking at the current code, the liveCheckHandler function already returns HTTP 200 OK, so this change is likely reinforcing this behavior to prevent any future changes from altering it.

2. Health Status Calculation Simplification

The health status calculation in the liveness handler is refactored and simplified to be based solely on the number of failed checks (failCount). The PR removes the dependency on the determineHealthStatus function which takes into account "critical failures" through the isCriticalFailure function.

The current implementation already calculates a status based on the number of failed checks:

switch failCount {
case 0:
    status = Pass
case len(checks):
    status = Fail
default:
    status = Warn
}

This approach is simpler and more predictable than the previous logic that considered critical failures separately.

3. Error Handling for JSON Encoding

The PR adds error handling for JSON encoding errors in the readiness handler. Looking at the current code, both handlers already have error handling for JSON encoding:

err := json.NewEncoder(w).Encode(res)
if err != nil {
    mainLog.Warning(fmt.Sprintf("[Readiness] Could not encode response, error: %s", err.Error()))
}

4. Comprehensive Test Coverage

The PR adds extensive tests for the health check endpoints (+552 lines in health_check_test.go), including:

  • Tests for /hello status and response body
  • Comparison tests between liveness and readiness behavior
  • Edge case coverage and header checks
  • JSON payload field validation and error handling

Impact and Benefits

This change improves the reliability of Kubernetes deployments by ensuring the liveness probe always returns 200 OK, preventing unnecessary pod restarts due to temporary issues that don't affect the gateway's ability to process requests.

The distinction between liveness (/hello) and readiness (/ready) endpoints is now clearer:

  • Liveness: Always returns 200 OK with health status in the body
  • Readiness: Returns non-200 status codes when the gateway is not ready to serve requests

This aligns with Kubernetes best practices where liveness probes should be lightweight and only check if the application is running, while readiness probes should determine if the application is ready to serve traffic.


Tip: Mention me again using /performance <request>.
Powered by Probe AI
Performance Impact Reviewer Prompt

Copy link

@ilijabojanovic ilijabojanovic merged commit 8cd87da into master Aug 11, 2025
42 of 44 checks passed
@ilijabojanovic ilijabojanovic deleted the TT-15507 branch August 11, 2025 14:26
@ilijabojanovic
Copy link
Member

/release to relesae-5.8.4

Copy link

tykbot bot commented Aug 11, 2025

@ilijabojanovic Release branch not found

@ilijabojanovic
Copy link
Member

/release to release-5.8.4

@ilijabojanovic
Copy link
Member

/release to release-5.9.1

Copy link

tykbot bot commented Aug 11, 2025

Working on it! Note that it can take a few minutes.

1 similar comment
Copy link

tykbot bot commented Aug 11, 2025

Working on it! Note that it can take a few minutes.

tykbot bot pushed a commit that referenced this pull request Aug 11, 2025
### **User description**

- The /hello (liveness) endpoint is reverted to always respond with HTTP
200, regardless of internal health check status.
- Health status calculation in the liveness handler is refactored and
simplified—now based solely on the number of failed checks (failCount),
removing the previous "critical failure" logic.
- Error handling is added for JSON encoding errors in the readiness
handler, so encoding issues are now logged.

<details open>
<summary><a href="https://tyktech.atlassian.net/browse/TT-15507"
title="TT-15507" target="_blank">TT-15507</a></summary>
  <br />
  <table>
    <tr>
      <th>Summary</th>
      <td>Revert changes to the "/hello" health check endpoint</td>
    </tr>
    <tr>
      <th>Type</th>
      <td>
<img alt="Bug"
src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium"
/>
        Bug
      </td>
    </tr>
    <tr>
      <th>Status</th>
      <td>In Dev</td>
    </tr>
    <tr>
      <th>Points</th>
      <td>N/A</td>
    </tr>
    <tr>
      <th>Labels</th>
      <td>-</td>
    </tr>
  </table>
</details>
<!--
  do not remove this marker as it will break jira-lint's functionality.
  added_by_jira_lint
-->

---

<!-- Provide a general summary of your changes in the Title above -->

## Description

<!-- Describe your changes in detail -->

## Related Issue

<!-- This project only accepts pull requests related to open issues. -->
<!-- If suggesting a new feature or change, please discuss it in an
issue first. -->
<!-- If fixing a bug, there should be an issue describing it with steps
to reproduce. -->
<!-- OSS: Please link to the issue here. Tyk: please create/link the
JIRA ticket. -->

## Motivation and Context

<!-- Why is this change required? What problem does it solve? -->

## How This Has Been Tested

<!-- Please describe in detail how you tested your changes -->
<!-- Include details of your testing environment, and the tests -->
<!-- you ran to see how your change affects other areas of the code,
etc. -->
<!-- This information is helpful for reviewers and QA. -->

## Screenshots (if appropriate)

## Types of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test
coverage to functionality)

## Checklist

<!-- Go over all the following points, and put an `x` in all the boxes
that apply -->
<!-- If there are no documentation updates required, mark the item as
checked. -->
<!-- Raise up any additional concerns not covered by the checklist. -->

- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning
why it's required
- [ ] I would like a code coverage CI quality gate exception and have
explained why


___

### **PR Type**
Bug fix, Tests


___

### **Description**
Restore /hello to always return 200
Compute Pass/Warn/Fail from check counts
Add extensive liveness vs readiness tests
Improve JSON encode error handling logs


___

### Diagram Walkthrough


```mermaid
flowchart LR
  liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"]
  liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"]
  readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"]
  tests["health_check_test.go"] -- "cover scenarios" --> liveH
  tests -- "compare vs readiness" --> readyH
```



<details> <summary><h3> File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>health_check.go</strong><dd><code>Rework liveness logic
and enforce 200 response</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

gateway/health_check.go

<ul><li>Compute failCount inline and set status.<br> <li> Always write
HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in
readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul>


</details>


  </td>
<td><a
href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a>&nbsp;
&nbsp; </td>

</tr>
</table></td></tr><tr><td><strong>Tests</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>health_check_test.go</strong><dd><code>Add
comprehensive liveness and readiness tests</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

gateway/health_check_test.go

<ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs
readiness behavior tests.<br> <li> Add edge case coverage and headers
checks.<br> <li> Validate JSON payload fields and errors.</ul>


</details>


  </td>
<td><a
href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a>&nbsp;
</td>

</tr>
</table></td></tr></tr></tbody></table>

</details>

___

(cherry picked from commit 8cd87da)
Copy link

tykbot bot commented Aug 11, 2025

@ilijabojanovic Created merge PRs

tykbot bot pushed a commit that referenced this pull request Aug 11, 2025
### **User description**

- The /hello (liveness) endpoint is reverted to always respond with HTTP
200, regardless of internal health check status.
- Health status calculation in the liveness handler is refactored and
simplified—now based solely on the number of failed checks (failCount),
removing the previous "critical failure" logic.
- Error handling is added for JSON encoding errors in the readiness
handler, so encoding issues are now logged.

<details open>
<summary><a href="https://tyktech.atlassian.net/browse/TT-15507"
title="TT-15507" target="_blank">TT-15507</a></summary>
  <br />
  <table>
    <tr>
      <th>Summary</th>
      <td>Revert changes to the "/hello" health check endpoint</td>
    </tr>
    <tr>
      <th>Type</th>
      <td>
<img alt="Bug"
src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium"
/>
        Bug
      </td>
    </tr>
    <tr>
      <th>Status</th>
      <td>In Dev</td>
    </tr>
    <tr>
      <th>Points</th>
      <td>N/A</td>
    </tr>
    <tr>
      <th>Labels</th>
      <td>-</td>
    </tr>
  </table>
</details>
<!--
  do not remove this marker as it will break jira-lint's functionality.
  added_by_jira_lint
-->

---

<!-- Provide a general summary of your changes in the Title above -->

## Description

<!-- Describe your changes in detail -->

## Related Issue

<!-- This project only accepts pull requests related to open issues. -->
<!-- If suggesting a new feature or change, please discuss it in an
issue first. -->
<!-- If fixing a bug, there should be an issue describing it with steps
to reproduce. -->
<!-- OSS: Please link to the issue here. Tyk: please create/link the
JIRA ticket. -->

## Motivation and Context

<!-- Why is this change required? What problem does it solve? -->

## How This Has Been Tested

<!-- Please describe in detail how you tested your changes -->
<!-- Include details of your testing environment, and the tests -->
<!-- you ran to see how your change affects other areas of the code,
etc. -->
<!-- This information is helpful for reviewers and QA. -->

## Screenshots (if appropriate)

## Types of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test
coverage to functionality)

## Checklist

<!-- Go over all the following points, and put an `x` in all the boxes
that apply -->
<!-- If there are no documentation updates required, mark the item as
checked. -->
<!-- Raise up any additional concerns not covered by the checklist. -->

- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning
why it's required
- [ ] I would like a code coverage CI quality gate exception and have
explained why


___

### **PR Type**
Bug fix, Tests


___

### **Description**
Restore /hello to always return 200
Compute Pass/Warn/Fail from check counts
Add extensive liveness vs readiness tests
Improve JSON encode error handling logs


___

### Diagram Walkthrough


```mermaid
flowchart LR
  liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"]
  liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"]
  readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"]
  tests["health_check_test.go"] -- "cover scenarios" --> liveH
  tests -- "compare vs readiness" --> readyH
```



<details> <summary><h3> File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>health_check.go</strong><dd><code>Rework liveness logic
and enforce 200 response</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

gateway/health_check.go

<ul><li>Compute failCount inline and set status.<br> <li> Always write
HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in
readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul>


</details>


  </td>
<td><a
href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a>&nbsp;
&nbsp; </td>

</tr>
</table></td></tr><tr><td><strong>Tests</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>health_check_test.go</strong><dd><code>Add
comprehensive liveness and readiness tests</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

gateway/health_check_test.go

<ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs
readiness behavior tests.<br> <li> Add edge case coverage and headers
checks.<br> <li> Validate JSON payload fields and errors.</ul>


</details>


  </td>
<td><a
href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a>&nbsp;
</td>

</tr>
</table></td></tr></tr></tbody></table>

</details>

___

(cherry picked from commit 8cd87da)
buger added a commit that referenced this pull request Aug 11, 2025
…ealth check endpoint (#7295)

[TT-15507] Revert changes to the "/hello" health check endpoint (#7295)

### **User description**

- The /hello (liveness) endpoint is reverted to always respond with HTTP
200, regardless of internal health check status.
- Health status calculation in the liveness handler is refactored and
simplified—now based solely on the number of failed checks (failCount),
removing the previous "critical failure" logic.
- Error handling is added for JSON encoding errors in the readiness
handler, so encoding issues are now logged.

<details open>
<summary><a href="https://tyktech.atlassian.net/browse/TT-15507"
title="TT-15507" target="_blank">TT-15507</a></summary>
  <br />
  <table>
    <tr>
      <th>Summary</th>
      <td>Revert changes to the "/hello" health check endpoint</td>
    </tr>
    <tr>
      <th>Type</th>
      <td>
<img alt="Bug"
src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium"
/>
        Bug
      </td>
    </tr>
    <tr>
      <th>Status</th>
      <td>In Dev</td>
    </tr>
    <tr>
      <th>Points</th>
      <td>N/A</td>
    </tr>
    <tr>
      <th>Labels</th>
      <td>-</td>
    </tr>
  </table>
</details>
<!--
  do not remove this marker as it will break jira-lint's functionality.
  added_by_jira_lint
-->

---

<!-- Provide a general summary of your changes in the Title above -->

## Description

<!-- Describe your changes in detail -->

## Related Issue

<!-- This project only accepts pull requests related to open issues. -->
<!-- If suggesting a new feature or change, please discuss it in an
issue first. -->
<!-- If fixing a bug, there should be an issue describing it with steps
to reproduce. -->
<!-- OSS: Please link to the issue here. Tyk: please create/link the
JIRA ticket. -->

## Motivation and Context

<!-- Why is this change required? What problem does it solve? -->

## How This Has Been Tested

<!-- Please describe in detail how you tested your changes -->
<!-- Include details of your testing environment, and the tests -->
<!-- you ran to see how your change affects other areas of the code,
etc. -->
<!-- This information is helpful for reviewers and QA. -->

## Screenshots (if appropriate)

## Types of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test
coverage to functionality)

## Checklist

<!-- Go over all the following points, and put an `x` in all the boxes
that apply -->
<!-- If there are no documentation updates required, mark the item as
checked. -->
<!-- Raise up any additional concerns not covered by the checklist. -->

- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning
why it's required
- [ ] I would like a code coverage CI quality gate exception and have
explained why


___

### **PR Type**
Bug fix, Tests


___

### **Description**
Restore /hello to always return 200
Compute Pass/Warn/Fail from check counts
Add extensive liveness vs readiness tests
Improve JSON encode error handling logs


___

### Diagram Walkthrough


```mermaid
flowchart LR
  liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"]
  liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"]
  readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"]
  tests["health_check_test.go"] -- "cover scenarios" --> liveH
  tests -- "compare vs readiness" --> readyH
```



<details> <summary><h3> File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>health_check.go</strong><dd><code>Rework liveness logic
and enforce 200 response</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

gateway/health_check.go

<ul><li>Compute failCount inline and set status.<br> <li> Always write
HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in
readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul>


</details>


  </td>
<td><a
href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a>&nbsp;
&nbsp; </td>

</tr>
</table></td></tr><tr><td><strong>Tests</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>health_check_test.go</strong><dd><code>Add
comprehensive liveness and readiness tests</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

gateway/health_check_test.go

<ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs
readiness behavior tests.<br> <li> Add edge case coverage and headers
checks.<br> <li> Validate JSON payload fields and errors.</ul>


</details>


  </td>
<td><a
href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a>&nbsp;
</td>

</tr>
</table></td></tr></tr></tbody></table>

</details>

___
Copy link

tykbot bot commented Aug 11, 2025

@ilijabojanovic Created merge PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants