Workspace Count for GPT-OSS not working

**Describe the bug**
Following: https://kaito-project.github.io/kaito/docs/multi-node-inference/#basic-multi-node-setup

The current deployment file used for Workspace to increase the count from a default of 1 to 2 is not working.  Added attribute "count 2" (see below)

apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
  name: workspace-gpt-oss-vllm-nc-a100
  namespace: openai
resource:
  count: 2
  instanceType: "Standard_NC24ads_A100_v4"
  labelSelector:
    matchLabels:
      app: gpt-oss-120b-vllm

Once added another issue developed about "admission webhook validation.workspace.kaito.sh denied the request: validation failed: missing fields: max-model-len is required in the vllm section of the inference_config.yaml when using multi-GPU instances with <20GB of memory per GPU or distributed inference

Added attribute max-model-len: 4096 and it same error happens

inference:
  template:
    spec:
      containers:
        - name: vllm-openai
          image: <image location goes here>
          imagePullPolicy: IfNotPresent
          args:
            - --model
            - openai/gpt-oss-120b
            - --swap-space
            - "4"
            - --gpu-memory-utilization
            - "0.95"         
            - --port
            - "5000"
            - --max-model-len
            - "4096"               
          ports:
            - name: http
              containerPort: 5000
          resources:
            limits:
              nvidia.com/gpu: 1
              cpu: "24"
              memory: "220Gi"
            requests:
              nvidia.com/gpu: 1
              cpu: "12"
              memory: "110Gi"
          readinessProbe:
            httpGet:
              path: /health
              port: 5000
            initialDelaySeconds: 30
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 5000
            initialDelaySeconds: 600
            periodSeconds: 1800
          env:  #special configs for A10 gpu
            - name: VLLM_ATTENTION_BACKEND
              value: "TRITON_ATTN_VLLM_V1"
            - name: VLLM_DISABLE_SINKS
              value: "1"



**Steps To Reproduce**
Add attribute count: 2
Add attribute max-model-len: "4096"     

**Expected behavior**
The number of workspaces should go to 2

**Logs**

**Environment**
AKS 

- Kubernetes version (use `kubectl version`): 1.33.6
- OS (e.g: `cat /etc/os-release`):
- Install tools:
- Others:

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workspace Count for GPT-OSS not working #1653

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Workspace Count for GPT-OSS not working #1653

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions