Description
I've been exploring Pyrra over the last few days and had some trouble getting the UI to work for me, so I've also been looking at the generic rules generation. This works well for us because we're already using Grafana for all of our observability stuff today.
I've deployed the following SLO:
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: test-api-playground-latency
namespace: monitoring
labels:
prometheus: k8s
role: alert-rules
spec:
target: '95'
window: 4w
description: Service responded to 95% of requests within half a second
indicator:
latency:
success:
metric: http_server_requests_seconds_bucket{application="test-api", namespace="playground", le="0.5"}
total:
metric: http_server_requests_seconds_bucket{application="test-api", namespace="playground", le="+Inf"}
The metric I'm using for this SLO is a Spring Boot histogram for request times. The generated PrometheusRule looks correct to me and almost all of the recording rules record data as expected.
However, the pyrra_availability
generic metric recording rule gets generated with an extra le=""
label, despite there already being an le
label in the original query. Here's the generated availability recording rule:
- expr: sum(http_server_requests_seconds:increase4w{application="test-api",le="0.5",namespace="playground",slo="test-api-playground-latency"}
or vector(0)) / sum(http_server_requests_seconds:increase4w{application="test-api",le="",le="+Inf",namespace="playground",slo="test-api-playground-latency"})
labels:
slo: test-api-playground-latency
record: pyrra_availability
If I run the query in expression manually in Grafana I get no data because of that le=""
in the denominator. If I remove that label and run the following query, it returns data fine.
sum(http_server_requests_seconds:increase4w{application="test-api",le="0.5",namespace="playground",slo="test-api-playground-latency"}
or vector(0)) / sum(http_server_requests_seconds:increase4w{application="test-api",le="+Inf",namespace="playground",slo="test-api-playground-latency"})
I see there was some work related to native histograms that seems to have introduced the empty le filter, so maybe related.