VCL failure undefined behavior

Consider the following test case:

```vcl
varnishtest "backend fetch despite http_max_hdr overflow"

server s1 {
	rxreq
	expect req.http.hdr20 == set
	expect req.http.hdr32 == <undef>
	txresp
} -start

varnish v1 -cliok "param.set http_max_hdr 32"
varnish v1 -vcl+backend {
	sub vcl_backend_fetch {
		set bereq.http.hdr01 = "set";
		set bereq.http.hdr02 = "set";
		set bereq.http.hdr03 = "set";
		set bereq.http.hdr04 = "set";
		set bereq.http.hdr05 = "set";
		set bereq.http.hdr06 = "set";
		set bereq.http.hdr07 = "set";
		set bereq.http.hdr08 = "set";
		set bereq.http.hdr09 = "set";
		set bereq.http.hdr10 = "set";
		set bereq.http.hdr11 = "set";
		set bereq.http.hdr12 = "set";
		set bereq.http.hdr13 = "set";
		set bereq.http.hdr14 = "set";
		set bereq.http.hdr15 = "set";
		set bereq.http.hdr16 = "set";
		set bereq.http.hdr17 = "set";
		set bereq.http.hdr18 = "set";
		set bereq.http.hdr19 = "set";
		set bereq.http.hdr19 = "set";
		set bereq.http.hdr20 = "set";
		set bereq.http.hdr21 = "set";
		set bereq.http.hdr22 = "set";
		set bereq.http.hdr23 = "set";
		set bereq.http.hdr24 = "set";
		set bereq.http.hdr25 = "set";
		set bereq.http.hdr26 = "set";
		set bereq.http.hdr27 = "set";
		set bereq.http.hdr28 = "set";
		set bereq.http.hdr29 = "set";
		set bereq.http.hdr30 = "set";
		set bereq.http.hdr31 = "set";
		set bereq.http.hdr32 = "set";
	}
} -start

client c1 {
	txreq
	rxresp
	expect resp.status == 200
} -run
```

It passes, but arguably, it shouldn't. Even worse, the problem is caught and logged, but not acted upon and it points us in a wrong direction:

```
**** v1    vsl|       1002 BereqHeader     b hdr23: set
**** v1    vsl|       1002 LostHeader      b hdr24: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr25: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr26: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr27: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr28: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr29: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr30: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr31: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 LostHeader      b hdr32: set
**** v1    vsl|       1002 Error           b out of workspace (bo)
**** v1    vsl|       1002 VCL_return      b fetch
```

We started looking into this at v-s after a case where things were failing (I'll cover this aspect below) but bumping workspace didn't solve anything. Of course that would be expected when the limit we exceed is not workspace but `max_http_hdr`. It would only be fair to have an error message like `Exceeded max_http_hdr parameter` instead.

Something equivalent on the client side would fail, because before `resp` delivery we have a workspace overflow safety net that we seem to be lacking  on the backend side.

Leaving the `max_http_hdr` overflow aside, even this test case passes today:

```vcl
varnishtest "backend fetch despite workspace overflow"

server s1 {
        rxreq
        txresp
} -start

varnish v1 -vcl+backend {
        import vtc;

        sub vcl_backend_fetch {
                vtc.workspace_overflow(backend);
        }
} -start

client c1 {
        txreq
        rxresp
        expect resp.status == 200
} -run
```

Considering first that HTTP is an application protocol, and that headers act as variable placeholders in VCL, losing any header could result in a misbehaving application. Waiting until near the end of a task is already too late to fail the task. For VCL execution, we have the ability to `VRT_fail()` the task, and every single VCL statement is followed by a VRT failure check. We should do the same with header failures.

The problem with HTTP headers in the cache process is that they log errors directly on the VSL buffer and (ab)use the workspace to signal a failure. We can't (and probably shouldn't) trigger a VRT failure because we can't have a stable `VRT_CTX` for the duration of the task. Instead I suggest a subtle change:

```diff
--- bin/varnishd/cache/cache.h
+++ bin/varnishd/cache/cache.h
@@ -155,7 +155,7 @@ struct http {
        uint16_t                nhd;            /* Next free hd */
 
        enum VSL_tag_e          logtag;         /* Must be SLT_*Method */
-       struct vsl_log          *vsl;
+       struct worker           *wrk;
 
        struct ws               *ws;
        uint16_t                status;
```

With this, the VSL buffer would be one indirection away for actual logging, and for failures we could imagine having a `WRK_Fail()` function to log an error and fail the current task. That introduces new questions, for example a task may disembark a worker and reembark a new one.

With this, we would make HTTP headers failures converge towards a task failure, so they would be caught in VCL just like any other failure after every single VCL statement. Error conditions should generally converge towards failing the task, instead of requiring that all types of error conditions compound safety nets at some arbitrary point in tasks.

For errors happening in core code, I suggest we add a task failure check in the both req and fetch FSM loops.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VCL failure undefined behavior #3825

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VCL failure undefined behavior #3825

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions