-
-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.
Description
What would you like to be added:
We will focus on LLM-specific characteristics to load-balance traffic, like prefix-cache aware, kv-cache aware, lora-aware, load-aware, request-profile aware(summary or chat) and so on.
They're plugins baked into the envoy gateway.
- random selection as template and baseline, Envoy gateway plugin support with random selection #371
- LoRA aware plugin
- Fairness sharing
- prefix cache aware plugin
Why is this needed:
Better performance.
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.
Metadata
Metadata
Assignees
Labels
featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.