-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Component(s)
exporter/prometheusremotewrite
What happened?
Description
According to this related PrometheusRemoteWrite (PRW) issue, the WAL has been "broken for years". This is evident when the PRW exporter tries to write to the /api/v1/write
Prometheus API and received a 400 - Bad Request
response due to "error processing WAL entries" causing "Permanent error: out of order sample" and eventually "out of bounds". When removing the WAL config from the PRW exporter:
exporters:
prometheusremotewrite/0:
endpoint: http://prom-0.prom-endpoints.how-to.svc.cluster.local:9090/api/v1/write
tls:
insecure_skip_verify: false
- wal:
- directory: /otelcol
Then the issue is resolved.
Steps to Reproduce
We use Juju to deploy our infra:
TL;DR Deploy a metrics source (e.g. Alertmanager), deploy a metrics sink (e.g. Prometheus) and configure them in the otel-collector receivers and exporters.
Expected Result
Metrics arrive in Prometheus with a working WAL.
Actual Result
Metrics arrive in Prometheus, but there are errors hinting at a broken WAL in the otel-collector logs.
Collector version
0.130.1
Environment information
Environment
OS: Ubuntu 24.04.2 LTS
OpenTelemetry Collector configuration
connectors: {}
exporters:
debug:
verbosity: basic
prometheusremotewrite/0:
endpoint: http://prom-0.prom-endpoints.how-to.svc.cluster.local:9090/api/v1/write
tls:
insecure_skip_verify: false
wal:
directory: /otelcol
extensions:
file_storage:
directory: /otelcol
health_check:
endpoint: 0.0.0.0:13133
processors:
attributes:
actions:
- action: upsert
key: loki.attribute.labels
value: container, job, filename, juju_application, juju_charm, juju_model, juju_model_uuid, juju_unit, snap_name, path
resource:
attributes:
- action: insert
key: loki.format
value: raw
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: juju_how-to_7b30903e_otelcol_self-monitoring
scrape_interval: 60s
static_configs:
- labels:
instance: how-to_7b30903e_otelcol_otelcol/0
juju_application: otelcol
juju_charm: opentelemetry-collector-k8s
juju_model: how-to
juju_model_uuid: 7b30903e-8941-4a40-864c-0cbbf277c57f
juju_unit: otelcol/0
targets:
- 0.0.0.0:8888
- job_name: juju_how-to_7b30903e_am_prometheus_scrape
metrics_path: /metrics
relabel_configs:
- regex: (.*)
separator: _
source_labels:
- juju_model
- juju_model_uuid
- juju_application
target_label: instance
scheme: http
static_configs:
- labels:
juju_application: am
juju_charm: alertmanager-k8s
juju_model: how-to
juju_model_uuid: 7b30903e-8941-4a40-864c-0cbbf277c57f
targets:
- am-0.am-endpoints.how-to.svc.cluster.local:9093
tls_config:
insecure_skip_verify: false
service:
extensions:
- health_check
- file_storage
pipelines:
logs:
exporters:
- debug
processors:
- resource
- attributes
receivers:
- otlp
metrics:
exporters:
- prometheusremotewrite/0
receivers:
- otlp
- prometheus
traces:
exporters:
- debug
receivers:
- otlp
telemetry:
logs:
level: DEBUG
metrics:
level: normal
Log output
2025-08-05T14:11:50.231Z [otelcol] 2025-08-05T14:11:50.231Z error prw.wal [email protected]/wal.go:245 error processing WAL entries {"resource": {"service.instance.id": "00ba5573-5bb4-4294-b1ca-1f84b32dbf29", "service.name": "otelcol", "service.version": "0.130.1"}, "otelcol.component.id": "prometheusremotewrite/0", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics", "error": "Permanent error: Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): out of order sample\n; Permanent error: Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): out of order sample\n; Permanent error: Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): out of order sample\n", "errorCauses": [{"error": "Permanent error: Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): out of order sample\n"}, {"error": "Permanent error: Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): out of order sample\n"}, {"error": "Permanent error: Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): out of order sample\n"}]}
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.