-
-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.
Description
What would you like to be added:
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: opt-125m
spec:
familyName: opt
source:
modelHub:
modelID: facebook/opt-125m
inferenceConfig:
flavors:
- name: h800
priority: 5 # higher priority
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: h800
limits:
nvidia.com/gpu: 4
- name: h100
priority: 4
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: h100
limits:
nvidia.com/gpu: 4
- name: a100
priority: 3
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: a100
limits:
nvidia.com/gpu: 4
- name: a20
priority: 2
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: a20
limits:
nvidia.com/gpu: 4
- name: t4
priority: 1 # lower priority
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: t4
limits:
nvidia.com/gpu: 4
Why is this needed:
When multiple flavors are defined for a model, there is currently no explicit way to control their matching order during scheduling. The scheduler uses the order defined in the list, which may not reflect the intended preference.
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.
Metadata
Metadata
Assignees
Labels
featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.