Skip to content

[processor/redactionprocessor] Add capabilities to sanitize urls #41535

@iblancasa

Description

@iblancasa

Component(s)

No response

Is your feature request related to a problem? Please describe.

Follow up for #41100

* One limitation of the library used is that is has been optimized for urls (for example, by stripping out the query string params and using / as separator during tokenization). In addition, the model has been trained with HTTP/TCP vocabulary to optimize this use case. Re-training the model or using different models optimized for different use cases is still possible. This would require some additional parameters to the function, depending on the use case, or multiple functions.
...
* The original idea was to provide a component that can be thrown into a pipeline with minimal configuration and provide basic sanitization. It draws inspiration from sanitization middleware used in many API gateways. I could foresee other functionality like truncating too long attribute values or dropping certain problematic attributes known to cause high cardinality (although this would potentially go beyond the traces use case, and potentially overlap with functionality in other components).

It was agreed to implement something sanitize_urls that can sanitize attributes, logs and span names.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions