-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[TT-15507] Revert changes to the "/hello" health check endpoint #7295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
API Changes no api changes detected |
PR Code Suggestions ✨Explore these optional code suggestions:
|
📦 Impact Review Snapshot
## Impact AssessmentThis PR reverts changes to the "/hello" health check endpoint, making it always return HTTP 200 OK regardless of the actual health status. The health status (Pass/Warn/Fail) is still computed and included in the response body, but the HTTP status code is always 200. This change affects how monitoring systems might interpret the health of the Tyk Gateway, as they would need to inspect the response body rather than relying on HTTP status codes. The readiness endpoint ("/ready") behavior remains unchanged - it will still return non-200 status codes when the gateway is not ready to serve requests. ## Required Updates
## Compatibility ConcernsThis change is generally backward compatible since:
However, systems that rely specifically on non-200 status codes from the "/hello" endpoint to detect unhealthy states will need to be updated. This includes Kubernetes liveness/readiness probes, monitoring systems, and any custom health check implementations. ## Summary & Recommendations
Tip: Mention me again using |
🚦 Connectivity Review Snapshot
## Connectivity Assessment
## Test Coverage Validation
## Security & Performance Impact
## Summary & Recommendations
Tip: Mention me again using |
🛡️ Security Snapshot
## Security Impact AnalysisThe PR reverts the The addition of error handling for JSON encoding errors in the readiness handler is a positive improvement that prevents silent failures and provides better logging. ## Identified VulnerabilitiesNo direct security vulnerabilities were introduced by this change. The modification to always return 200 OK from the liveness endpoint could potentially mask system health issues from monitoring systems that only check HTTP status codes, but this is an operational concern rather than a security vulnerability. The PR includes comprehensive test coverage that verifies the behavior of both endpoints across various failure scenarios, which mitigates the risk of unintended side effects. ## Security Recommendations
## OWASP ComplianceThis change does not directly impact OWASP Top 10 concerns as it doesn't modify authentication, authorization, input validation, or data protection mechanisms. The health check endpoints are typically internal operational endpoints and not directly exposed to end users. The improved error handling for JSON encoding errors aligns with OWASP's recommendation for proper error handling and logging. ## Summary
Tip: Mention me again using |
🚀 Performance Snapshot
## Performance Impact AnalysisThis PR makes a targeted change to the health check endpoint behavior with minimal performance impact. The ## Critical AreasThe main change is in the
This affects the ## Optimization RecommendationsThe changes are well-optimized and follow best practices for health check endpoints. The computation of status from check counts is straightforward and efficient. The error handling for JSON encoding is appropriate, logging warnings without affecting the response status. No further optimizations are needed as this is a targeted fix that improves reliability without adding overhead. ## Summary
No suggestions to provide – change LGTM. Tip: Mention me again using |
This commit updates the health check test files by introducing a new variable assignment to prevent variable shadowing within the test cases. This change enhances code clarity and maintains the integrity of test execution across multiple test cases.
Analysis of PR #7295: Revert changes to the "/hello" health check endpointThis PR makes important changes to the health check endpoints in Tyk Gateway, specifically reverting the behavior of the Key Changes
RationaleThis change aligns with Kubernetes best practices for liveness probes, which should only check if the application is running and responding, not its full health status. A non-200 response from a liveness probe would cause Kubernetes to restart the pod, which might not be the desired behavior for temporary health check failures. The readiness endpoint ( Impact
This change improves the stability of Tyk Gateway deployments in Kubernetes environments while still providing detailed health information in the response body. Tip: Mention me again using |
🛡️ Security Snapshot
## Security Impact AnalysisThe PR reverts the The addition of error handling for JSON encoding errors in the readiness handler is a positive improvement that prevents silent failures and provides better logging. ## Identified VulnerabilitiesNo direct security vulnerabilities were introduced by this change. The modification to always return 200 OK from the liveness endpoint could potentially mask system health issues from monitoring systems that only check HTTP status codes, but this is an operational concern rather than a security vulnerability. The PR includes comprehensive test coverage that verifies the behavior of both endpoints across various failure scenarios, which mitigates the risk of unintended side effects. ## Security Recommendations
## OWASP ComplianceThis change does not directly impact OWASP Top 10 concerns as it doesn't modify authentication, authorization, input validation, or data protection mechanisms. The health check endpoints are typically internal operational endpoints and not directly exposed to end users. The improved error handling for JSON encoding errors aligns with OWASP's recommendation for proper error handling and logging. ## Summary
Tip: Mention me again using |
Analysis of PR #7295: Revert changes to the "/hello" health check endpointThis PR makes a targeted change to the Tyk Gateway's health check system, specifically reverting the behavior of the Key Changes
RationaleThis change follows Kubernetes best practices for liveness probes. The liveness probe should only check if the application is running, not its health status. By always returning 200 OK, it prevents unnecessary pod restarts when there are non-critical failures. The actual health status is still available in the response body for monitoring systems. The readiness probe ( This is a well-structured change with excellent test coverage that properly separates the concerns of liveness (is the application running?) from readiness (is the application ready to serve requests?). Tip: Mention me again using |
Analysis of PR #7295: Revert changes to the "/hello" health check endpointThis PR makes targeted changes to the health check endpoints in the Tyk Gateway, focusing on three main areas: 1. Liveness Endpoint BehaviorThe PR ensures that the 2. Health Status Calculation SimplificationThe health status calculation in the liveness handler is refactored and simplified to be based solely on the number of failed checks ( The current implementation already calculates a status based on the number of failed checks: switch failCount {
case 0:
status = Pass
case len(checks):
status = Fail
default:
status = Warn
} This approach is simpler and more predictable than the previous logic that considered critical failures separately. 3. Error Handling for JSON EncodingThe PR adds error handling for JSON encoding errors in the readiness handler. Looking at the current code, both handlers already have error handling for JSON encoding: err := json.NewEncoder(w).Encode(res)
if err != nil {
mainLog.Warning(fmt.Sprintf("[Readiness] Could not encode response, error: %s", err.Error()))
} 4. Comprehensive Test CoverageThe PR adds extensive tests for the health check endpoints (+552 lines in health_check_test.go), including:
Impact and BenefitsThis change improves the reliability of Kubernetes deployments by ensuring the liveness probe always returns 200 OK, preventing unnecessary pod restarts due to temporary issues that don't affect the gateway's ability to process requests. The distinction between liveness (
This aligns with Kubernetes best practices where liveness probes should be lightweight and only check if the application is running, while readiness probes should determine if the application is ready to serve traffic. Tip: Mention me again using |
|
/release to relesae-5.8.4 |
@ilijabojanovic Release branch not found |
/release to release-5.8.4 |
/release to release-5.9.1 |
Working on it! Note that it can take a few minutes. |
1 similar comment
Working on it! Note that it can take a few minutes. |
### **User description** - The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status. - Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic. - Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged. <details open> <summary><a href="https://tyktech.atlassian.net/browse/TT-15507" title="TT-15507" target="_blank">TT-15507</a></summary> <br /> <table> <tr> <th>Summary</th> <td>Revert changes to the "/hello" health check endpoint</td> </tr> <tr> <th>Type</th> <td> <img alt="Bug" src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium" /> Bug </td> </tr> <tr> <th>Status</th> <td>In Dev</td> </tr> <tr> <th>Points</th> <td>N/A</td> </tr> <tr> <th>Labels</th> <td>-</td> </tr> </table> </details> <!-- do not remove this marker as it will break jira-lint's functionality. added_by_jira_lint --> --- <!-- Provide a general summary of your changes in the Title above --> ## Description <!-- Describe your changes in detail --> ## Related Issue <!-- This project only accepts pull requests related to open issues. --> <!-- If suggesting a new feature or change, please discuss it in an issue first. --> <!-- If fixing a bug, there should be an issue describing it with steps to reproduce. --> <!-- OSS: Please link to the issue here. Tyk: please create/link the JIRA ticket. --> ## Motivation and Context <!-- Why is this change required? What problem does it solve? --> ## How This Has Been Tested <!-- Please describe in detail how you tested your changes --> <!-- Include details of your testing environment, and the tests --> <!-- you ran to see how your change affects other areas of the code, etc. --> <!-- This information is helpful for reviewers and QA. --> ## Screenshots (if appropriate) ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality) ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply --> <!-- If there are no documentation updates required, mark the item as checked. --> <!-- Raise up any additional concerns not covered by the checklist. --> - [ ] I ensured that the documentation is up to date - [ ] I explained why this PR updates go.mod in detail with reasoning why it's required - [ ] I would like a code coverage CI quality gate exception and have explained why ___ ### **PR Type** Bug fix, Tests ___ ### **Description** Restore /hello to always return 200 Compute Pass/Warn/Fail from check counts Add extensive liveness vs readiness tests Improve JSON encode error handling logs ___ ### Diagram Walkthrough ```mermaid flowchart LR liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"] liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"] readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"] tests["health_check_test.go"] -- "cover scenarios" --> liveH tests -- "compare vs readiness" --> readyH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check.go</strong><dd><code>Rework liveness logic and enforce 200 response</code> </dd></summary> <hr> gateway/health_check.go <ul><li>Compute failCount inline and set status.<br> <li> Always write HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a> </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check_test.go</strong><dd><code>Add comprehensive liveness and readiness tests</code> </dd></summary> <hr> gateway/health_check_test.go <ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs readiness behavior tests.<br> <li> Add edge case coverage and headers checks.<br> <li> Validate JSON payload fields and errors.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a> </td> </tr> </table></td></tr></tr></tbody></table> </details> ___ (cherry picked from commit 8cd87da)
@ilijabojanovic Created merge PRs |
### **User description** - The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status. - Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic. - Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged. <details open> <summary><a href="https://tyktech.atlassian.net/browse/TT-15507" title="TT-15507" target="_blank">TT-15507</a></summary> <br /> <table> <tr> <th>Summary</th> <td>Revert changes to the "/hello" health check endpoint</td> </tr> <tr> <th>Type</th> <td> <img alt="Bug" src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium" /> Bug </td> </tr> <tr> <th>Status</th> <td>In Dev</td> </tr> <tr> <th>Points</th> <td>N/A</td> </tr> <tr> <th>Labels</th> <td>-</td> </tr> </table> </details> <!-- do not remove this marker as it will break jira-lint's functionality. added_by_jira_lint --> --- <!-- Provide a general summary of your changes in the Title above --> ## Description <!-- Describe your changes in detail --> ## Related Issue <!-- This project only accepts pull requests related to open issues. --> <!-- If suggesting a new feature or change, please discuss it in an issue first. --> <!-- If fixing a bug, there should be an issue describing it with steps to reproduce. --> <!-- OSS: Please link to the issue here. Tyk: please create/link the JIRA ticket. --> ## Motivation and Context <!-- Why is this change required? What problem does it solve? --> ## How This Has Been Tested <!-- Please describe in detail how you tested your changes --> <!-- Include details of your testing environment, and the tests --> <!-- you ran to see how your change affects other areas of the code, etc. --> <!-- This information is helpful for reviewers and QA. --> ## Screenshots (if appropriate) ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality) ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply --> <!-- If there are no documentation updates required, mark the item as checked. --> <!-- Raise up any additional concerns not covered by the checklist. --> - [ ] I ensured that the documentation is up to date - [ ] I explained why this PR updates go.mod in detail with reasoning why it's required - [ ] I would like a code coverage CI quality gate exception and have explained why ___ ### **PR Type** Bug fix, Tests ___ ### **Description** Restore /hello to always return 200 Compute Pass/Warn/Fail from check counts Add extensive liveness vs readiness tests Improve JSON encode error handling logs ___ ### Diagram Walkthrough ```mermaid flowchart LR liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"] liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"] readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"] tests["health_check_test.go"] -- "cover scenarios" --> liveH tests -- "compare vs readiness" --> readyH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check.go</strong><dd><code>Rework liveness logic and enforce 200 response</code> </dd></summary> <hr> gateway/health_check.go <ul><li>Compute failCount inline and set status.<br> <li> Always write HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a> </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check_test.go</strong><dd><code>Add comprehensive liveness and readiness tests</code> </dd></summary> <hr> gateway/health_check_test.go <ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs readiness behavior tests.<br> <li> Add edge case coverage and headers checks.<br> <li> Validate JSON payload fields and errors.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a> </td> </tr> </table></td></tr></tr></tbody></table> </details> ___ (cherry picked from commit 8cd87da)
…ealth check endpoint (#7295) [TT-15507] Revert changes to the "/hello" health check endpoint (#7295) ### **User description** - The /hello (liveness) endpoint is reverted to always respond with HTTP 200, regardless of internal health check status. - Health status calculation in the liveness handler is refactored and simplified—now based solely on the number of failed checks (failCount), removing the previous "critical failure" logic. - Error handling is added for JSON encoding errors in the readiness handler, so encoding issues are now logged. <details open> <summary><a href="https://tyktech.atlassian.net/browse/TT-15507" title="TT-15507" target="_blank">TT-15507</a></summary> <br /> <table> <tr> <th>Summary</th> <td>Revert changes to the "/hello" health check endpoint</td> </tr> <tr> <th>Type</th> <td> <img alt="Bug" src="https://tyktech.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10303?size=medium" /> Bug </td> </tr> <tr> <th>Status</th> <td>In Dev</td> </tr> <tr> <th>Points</th> <td>N/A</td> </tr> <tr> <th>Labels</th> <td>-</td> </tr> </table> </details> <!-- do not remove this marker as it will break jira-lint's functionality. added_by_jira_lint --> --- <!-- Provide a general summary of your changes in the Title above --> ## Description <!-- Describe your changes in detail --> ## Related Issue <!-- This project only accepts pull requests related to open issues. --> <!-- If suggesting a new feature or change, please discuss it in an issue first. --> <!-- If fixing a bug, there should be an issue describing it with steps to reproduce. --> <!-- OSS: Please link to the issue here. Tyk: please create/link the JIRA ticket. --> ## Motivation and Context <!-- Why is this change required? What problem does it solve? --> ## How This Has Been Tested <!-- Please describe in detail how you tested your changes --> <!-- Include details of your testing environment, and the tests --> <!-- you ran to see how your change affects other areas of the code, etc. --> <!-- This information is helpful for reviewers and QA. --> ## Screenshots (if appropriate) ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality) ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply --> <!-- If there are no documentation updates required, mark the item as checked. --> <!-- Raise up any additional concerns not covered by the checklist. --> - [ ] I ensured that the documentation is up to date - [ ] I explained why this PR updates go.mod in detail with reasoning why it's required - [ ] I would like a code coverage CI quality gate exception and have explained why ___ ### **PR Type** Bug fix, Tests ___ ### **Description** Restore /hello to always return 200 Compute Pass/Warn/Fail from check counts Add extensive liveness vs readiness tests Improve JSON encode error handling logs ___ ### Diagram Walkthrough ```mermaid flowchart LR liveH["liveCheckHandler (/hello)"] -- "always 200" --> http200["HTTP 200"] liveH -- "derive Pass/Warn/Fail" --> status["Response.Status"] readyH["readinessHandler (/ready)"] -- "encode + log errors" --> logR["Warning on encode error"] tests["health_check_test.go"] -- "cover scenarios" --> liveH tests -- "compare vs readiness" --> readyH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check.go</strong><dd><code>Rework liveness logic and enforce 200 response</code> </dd></summary> <hr> gateway/health_check.go <ul><li>Compute failCount inline and set status.<br> <li> Always write HTTP 200 for /hello.<br> <li> Add error handling for JSON encode in readiness.<br> <li> Remove determineHealthStatus usage in liveness.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-978a2d1427d9209765e541618af10683944c6396df1a6fb8b5221e4f16658a6a">+24/-4</a> </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>health_check_test.go</strong><dd><code>Add comprehensive liveness and readiness tests</code> </dd></summary> <hr> gateway/health_check_test.go <ul><li>Add tests for /hello status and body.<br> <li> Add liveness vs readiness behavior tests.<br> <li> Add edge case coverage and headers checks.<br> <li> Validate JSON payload fields and errors.</ul> </details> </td> <td><a href="https://github.com/TykTechnologies/tyk/pull/7295/files#diff-08e29946afc7757a9c7baaef04b1a81964640437a684ff6306d1a0c933ac3f6a">+552/-0</a> </td> </tr> </table></td></tr></tr></tbody></table> </details> ___
@ilijabojanovic Created merge PRs |
User description
TT-15507
Description
Related Issue
Motivation and Context
How This Has Been Tested
Screenshots (if appropriate)
Types of changes
Checklist
PR Type
Bug fix, Tests
Description
Restore /hello to always return 200
Compute Pass/Warn/Fail from check counts
Add extensive liveness vs readiness tests
Improve JSON encode error handling logs
Diagram Walkthrough
File Walkthrough
health_check.go
Rework liveness logic and enforce 200 response
gateway/health_check.go
health_check_test.go
Add comprehensive liveness and readiness tests
gateway/health_check_test.go