-
Notifications
You must be signed in to change notification settings - Fork 395
Description
Consider the following test case:
varnishtest "backend fetch despite http_max_hdr overflow"
server s1 {
rxreq
expect req.http.hdr20 == set
expect req.http.hdr32 == <undef>
txresp
} -start
varnish v1 -cliok "param.set http_max_hdr 32"
varnish v1 -vcl+backend {
sub vcl_backend_fetch {
set bereq.http.hdr01 = "set";
set bereq.http.hdr02 = "set";
set bereq.http.hdr03 = "set";
set bereq.http.hdr04 = "set";
set bereq.http.hdr05 = "set";
set bereq.http.hdr06 = "set";
set bereq.http.hdr07 = "set";
set bereq.http.hdr08 = "set";
set bereq.http.hdr09 = "set";
set bereq.http.hdr10 = "set";
set bereq.http.hdr11 = "set";
set bereq.http.hdr12 = "set";
set bereq.http.hdr13 = "set";
set bereq.http.hdr14 = "set";
set bereq.http.hdr15 = "set";
set bereq.http.hdr16 = "set";
set bereq.http.hdr17 = "set";
set bereq.http.hdr18 = "set";
set bereq.http.hdr19 = "set";
set bereq.http.hdr19 = "set";
set bereq.http.hdr20 = "set";
set bereq.http.hdr21 = "set";
set bereq.http.hdr22 = "set";
set bereq.http.hdr23 = "set";
set bereq.http.hdr24 = "set";
set bereq.http.hdr25 = "set";
set bereq.http.hdr26 = "set";
set bereq.http.hdr27 = "set";
set bereq.http.hdr28 = "set";
set bereq.http.hdr29 = "set";
set bereq.http.hdr30 = "set";
set bereq.http.hdr31 = "set";
set bereq.http.hdr32 = "set";
}
} -start
client c1 {
txreq
rxresp
expect resp.status == 200
} -run
It passes, but arguably, it shouldn't. Even worse, the problem is caught and logged, but not acted upon and it points us in a wrong direction:
**** v1 vsl| 1002 BereqHeader b hdr23: set
**** v1 vsl| 1002 LostHeader b hdr24: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr25: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr26: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr27: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr28: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr29: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr30: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr31: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 LostHeader b hdr32: set
**** v1 vsl| 1002 Error b out of workspace (bo)
**** v1 vsl| 1002 VCL_return b fetch
We started looking into this at v-s after a case where things were failing (I'll cover this aspect below) but bumping workspace didn't solve anything. Of course that would be expected when the limit we exceed is not workspace but max_http_hdr
. It would only be fair to have an error message like Exceeded max_http_hdr parameter
instead.
Something equivalent on the client side would fail, because before resp
delivery we have a workspace overflow safety net that we seem to be lacking on the backend side.
Leaving the max_http_hdr
overflow aside, even this test case passes today:
varnishtest "backend fetch despite workspace overflow"
server s1 {
rxreq
txresp
} -start
varnish v1 -vcl+backend {
import vtc;
sub vcl_backend_fetch {
vtc.workspace_overflow(backend);
}
} -start
client c1 {
txreq
rxresp
expect resp.status == 200
} -run
Considering first that HTTP is an application protocol, and that headers act as variable placeholders in VCL, losing any header could result in a misbehaving application. Waiting until near the end of a task is already too late to fail the task. For VCL execution, we have the ability to VRT_fail()
the task, and every single VCL statement is followed by a VRT failure check. We should do the same with header failures.
The problem with HTTP headers in the cache process is that they log errors directly on the VSL buffer and (ab)use the workspace to signal a failure. We can't (and probably shouldn't) trigger a VRT failure because we can't have a stable VRT_CTX
for the duration of the task. Instead I suggest a subtle change:
--- bin/varnishd/cache/cache.h
+++ bin/varnishd/cache/cache.h
@@ -155,7 +155,7 @@ struct http {
uint16_t nhd; /* Next free hd */
enum VSL_tag_e logtag; /* Must be SLT_*Method */
- struct vsl_log *vsl;
+ struct worker *wrk;
struct ws *ws;
uint16_t status;
With this, the VSL buffer would be one indirection away for actual logging, and for failures we could imagine having a WRK_Fail()
function to log an error and fail the current task. That introduces new questions, for example a task may disembark a worker and reembark a new one.
With this, we would make HTTP headers failures converge towards a task failure, so they would be caught in VCL just like any other failure after every single VCL statement. Error conditions should generally converge towards failing the task, instead of requiring that all types of error conditions compound safety nets at some arbitrary point in tasks.
For errors happening in core code, I suggest we add a task failure check in the both req and fetch FSM loops.