Skip to content

[TT-15507] Reverting /hello endpoint to always return 200 status code #7281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 11, 2025

Conversation

mativm02
Copy link
Contributor

@mativm02 mativm02 commented Aug 7, 2025

TT-15507
Summary Revert changes to the "/hello" health check endpoint
Type Bug Bug
Status In Dev
Points N/A
Labels -

User description

  • The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status.
  • Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic.
  • Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged.

@buger
Copy link
Member

buger commented Aug 7, 2025

💔 The detected issue is not in one of the allowed statuses 💔

Detected Status Open
Allowed Statuses In Dev,In Code Review,Ready for Testing,In Test,In Progress,In Review ✔️

Please ensure your jira story is in one of the allowed statuses

Copy link
Contributor

github-actions bot commented Aug 7, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Changed HTTP Status Code Logic

The /hello (liveness) endpoint now always returns HTTP 200, regardless of health check failures. Review if this aligns with intended health check semantics and does not break integrations or monitoring setups relying on non-200 statuses.

w.WriteHeader(http.StatusOK)
Error Handling Consistency

Error handling for JSON encoding was added to the readiness handler. Ensure this is consistent with other handlers and that logging is sufficient for operational visibility.

err := json.NewEncoder(w).Encode(res)
if err != nil {
	mainLog.Warning(fmt.Sprintf("[Readiness] Could not encode response, error: %s", err.Error()))
}

Copy link
Contributor

github-actions bot commented Aug 7, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle empty health checks correctly

The switch statement using case len(checks): will never match because failCount
cannot equal len(checks) unless all checks fail, but this logic may be unclear and
error-prone if checks is empty. Add a guard to handle the case where len(checks) is
zero to avoid setting status to Fail when there are no checks.

gateway/health_check.go [192-209]

 for _, v := range checks {
 	if v.Status == Fail {
 		failCount++
 	}
 }
 
 var status HealthCheckStatus
 
-switch failCount {
-case 0:
-	status = Pass
-
-case len(checks):
-	status = Fail
-
-default:
+if len(checks) == 0 {
 	status = Warn
+} else {
+	switch failCount {
+	case 0:
+		status = Pass
+	case len(checks):
+		status = Fail
+	default:
+		status = Warn
+	}
 }
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a subtle bug: if checks is empty, the current logic would set status to Fail, which may not be intended. The improved code adds a guard for the empty case, improving correctness and robustness of the health check endpoint.

Medium

Copy link
Contributor

github-actions bot commented Aug 7, 2025

API Changes

--- prev.txt	2025-08-11 13:26:46.243721410 +0000
+++ current.txt	2025-08-11 13:26:37.087665824 +0000
@@ -4736,7 +4736,7 @@
 
 	// Negate is a boolean negation operator. Setting it to true inverts the matching behaviour
 	// such that the rewrite will be triggered if the value does not match the `pattern` for this rule.
-	Negate bool `bson:"negate,omitempty" json:"negate,omitempty"`
+	Negate bool `bson:"negate" json:"negate"`
 }
     URLRewriteRule represents a rewrite
     matching rules. Tyk classic API definition:

Copy link
Contributor

github-actions bot commented Aug 7, 2025

Code Review: Reverting /hello endpoint to always return 200 status code

This PR makes several changes to the health check functionality in Tyk Gateway:

Key Changes

  1. Liveness Handler Always Returns 200 Status Code

    • The /hello endpoint now always returns HTTP 200 OK regardless of health check failures
    • Previously, it would return different status codes (503 Service Unavailable for failures)
    • This is a common pattern for Kubernetes liveness probes, which should only fail when the application is completely unresponsive
  2. Simplified Health Status Calculation

    • Removed the evaluateHealthChecks and determineHealthStatus helper functions
    • Replaced with direct calculation of failCount and a simple switch statement
    • Eliminated the concept of "critical failures" from the liveness handler
  3. Added Error Handling for JSON Encoding

    • Added proper error handling for JSON encoding in the readiness handler
    • Logs encoding errors to improve operational visibility

Assessment

This change is appropriate for Kubernetes environments where:

  • Liveness probes should only detect if the application is running, not its health state
  • Readiness probes handle service availability (which still returns 503 when services are unavailable)

The PR maintains the same health check logic but changes how the status code is reported, making it more aligned with Kubernetes best practices. The error handling improvement for JSON encoding is also a good defensive programming practice.

No issues were found that would prevent merging this PR. The changes are focused, well-documented, and maintain backward compatibility in terms of the API response structure while changing only the HTTP status code behavior.


Tip: Mention me again using /probe <request>.
Powered by Probe AI

This commit modifies multiple test functions to prevent variable shadowing by reassigning the test case variable within the loop. This change enhances code clarity and ensures that each test case operates on its own instance of the variable, improving test reliability.
@buger
Copy link
Member

buger commented Aug 11, 2025

I'm a bot and I 👍 this PR title. 🤖

@mativm02 mativm02 requested a review from pvormste August 11, 2025 13:27
Copy link

@ilijabojanovic ilijabojanovic merged commit 3b02603 into release-5.9.1 Aug 11, 2025
67 of 68 checks passed
@ilijabojanovic ilijabojanovic deleted the TT-15507-release-5.9.1 branch August 11, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants