Skip to content

Rogue label le="" showing up in one generic rule #1398

Open
@mtthwcmpbll

Description

@mtthwcmpbll

I've been exploring Pyrra over the last few days and had some trouble getting the UI to work for me, so I've also been looking at the generic rules generation. This works well for us because we're already using Grafana for all of our observability stuff today.

I've deployed the following SLO:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: test-api-playground-latency
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: '95'
  window: 4w
  description: Service responded to 95% of requests within half a second
  indicator:
    latency:
      success:
        metric: http_server_requests_seconds_bucket{application="test-api", namespace="playground", le="0.5"}
      total:
        metric: http_server_requests_seconds_bucket{application="test-api", namespace="playground", le="+Inf"}

The metric I'm using for this SLO is a Spring Boot histogram for request times. The generated PrometheusRule looks correct to me and almost all of the recording rules record data as expected.

However, the pyrra_availability generic metric recording rule gets generated with an extra le="" label, despite there already being an le label in the original query. Here's the generated availability recording rule:

- expr: sum(http_server_requests_seconds:increase4w{application="test-api",le="0.5",namespace="playground",slo="test-api-playground-latency"}
      or vector(0)) / sum(http_server_requests_seconds:increase4w{application="test-api",le="",le="+Inf",namespace="playground",slo="test-api-playground-latency"})
  labels:
    slo: test-api-playground-latency
  record: pyrra_availability

If I run the query in expression manually in Grafana I get no data because of that le="" in the denominator. If I remove that label and run the following query, it returns data fine.

sum(http_server_requests_seconds:increase4w{application="test-api",le="0.5",namespace="playground",slo="test-api-playground-latency"}
      or vector(0)) / sum(http_server_requests_seconds:increase4w{application="test-api",le="+Inf",namespace="playground",slo="test-api-playground-latency"})

I see there was some work related to native histograms that seems to have introduced the empty le filter, so maybe related.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions