Skip to content

Workspace Count for GPT-OSS not working #1653

@filidav

Description

@filidav

Describe the bug
Following: https://kaito-project.github.io/kaito/docs/multi-node-inference/#basic-multi-node-setup

The current deployment file used for Workspace to increase the count from a default of 1 to 2 is not working. Added attribute "count 2" (see below)

apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: workspace-gpt-oss-vllm-nc-a100
namespace: openai
resource:
count: 2
instanceType: "Standard_NC24ads_A100_v4"
labelSelector:
matchLabels:
app: gpt-oss-120b-vllm

Once added another issue developed about "admission webhook validation.workspace.kaito.sh denied the request: validation failed: missing fields: max-model-len is required in the vllm section of the inference_config.yaml when using multi-GPU instances with <20GB of memory per GPU or distributed inference

Added attribute max-model-len: 4096 and it same error happens

inference:
template:
spec:
containers:
- name: vllm-openai
image:
imagePullPolicy: IfNotPresent
args:
- --model
- openai/gpt-oss-120b
- --swap-space
- "4"
- --gpu-memory-utilization
- "0.95"
- --port
- "5000"
- --max-model-len
- "4096"
ports:
- name: http
containerPort: 5000
resources:
limits:
nvidia.com/gpu: 1
cpu: "24"
memory: "220Gi"
requests:
nvidia.com/gpu: 1
cpu: "12"
memory: "110Gi"
readinessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 600
periodSeconds: 1800
env: #special configs for A10 gpu
- name: VLLM_ATTENTION_BACKEND
value: "TRITON_ATTN_VLLM_V1"
- name: VLLM_DISABLE_SINKS
value: "1"

Steps To Reproduce
Add attribute count: 2
Add attribute max-model-len: "4096"

Expected behavior
The number of workspaces should go to 2

Logs

Environment
AKS

  • Kubernetes version (use kubectl version): 1.33.6
  • OS (e.g: cat /etc/os-release):
  • Install tools:
  • Others:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions