Skip to content

Commit 598c196

Browse files
committed
Release v0.0.9
Signed-off-by: kerthcet <[email protected]>
1 parent da22368 commit 598c196

File tree

11 files changed

+58
-50
lines changed

11 files changed

+58
-50
lines changed

.github/ISSUE_TEMPLATE/new-release.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ Please do not remove items from the checklist
1515
- [ ] Prepare the image and files
1616
- [ ] Run `PLATFORMS=linux/amd64 make image-push GIT_TAG=$VERSION` to build and push an image.
1717
- [ ] Run `make artifacts GIT_TAG=$VERSION` to generate the artifact.
18-
- [ ] Run `make helm-package` to package the helm chart and update the index.yaml.
1918
- [ ] Update `chart/Chart.yaml` and `docs/installation.md`, the helm version is different with the app version.
19+
- [ ] Run `make helm-package` to package the helm chart and update the index.yaml.
2020
- [ ] Submit a PR and merge it.
2121
- [ ] An OWNER [prepares a draft release](https://github.com/inftyai/llmaz/releases)
2222
- [ ] Create a new tag

chart/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ type: application
1313
# This is the chart version. This version number should be incremented each time you make changes
1414
# to the chart and its templates, including the app version.
1515
# Versions are expected to follow Semantic Versioning (https://semver.org/)
16-
version: 0.0.4
16+
version: 0.0.5
1717
# This is the version number of the application being deployed. This version number should be
1818
# incremented each time you make changes to the application. Versions are not expected to
1919
# follow Semantic Versioning. They should reflect the version the application is using.
2020
# It is recommended to use it with quotes.
21-
appVersion: 0.0.8
21+
appVersion: 0.0.9

chart/crds/openmodel-crd.yaml

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -95,28 +95,20 @@ spec:
9595
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
9696
x-kubernetes-int-or-string: true
9797
description: |-
98-
Requests defines the required accelerators to serve the model, like nvidia.com/gpu: 8.
99-
When GPU number is greater than 8, like 32, then multi-host inference is enabled and
100-
32/8=4 hosts will be grouped as an unit, each host will have a resource request as
101-
nvidia.com/gpu: 8. The may change in the future if the GPU number limit is broken.
102-
Not recommended to set the cpu and memory usage here.
103-
If using playground, you can define the cpu/mem usage at backendConfig.
104-
If using service, you can define the cpu/mem at the container resources.
105-
Note: if you define the same accelerator requests at playground/service as well,
98+
Requests defines the required accelerators to serve the model for each replica,
99+
like <nvidia.com/gpu: 8>. For multi-hosts cases, the requests here indicates
100+
the resource requirements for each replica. This may change in the future.
101+
Not recommended to set the cpu and memory usage here:
102+
- if using playground, you can define the cpu/mem usage at backendConfig.
103+
- if using inference service, you can define the cpu/mem at the container resources.
104+
However, if you define the same accelerator requests at playground/service as well,
106105
the requests here will be covered.
107106
type: object
108107
required:
109108
- name
110109
type: object
111110
maxItems: 8
112111
type: array
113-
preheat:
114-
default: false
115-
description: |-
116-
Preheat represents whether we should preload the model, by default will use Manta(https://github.com/InftyAI/Manta)
117-
to preload the model, so you should enable the Manta in prior.
118-
Note: right now, we only support preloading models from Huggingface.
119-
type: boolean
120112
source:
121113
description: |-
122114
Source represents the source of the model, there're several ways to load

chart/crds/playground-crd.yaml

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,21 @@ spec:
4545
BackendRuntimeConfig represents the inference backendRuntime configuration
4646
under the hood, e.g. vLLM, which is the default backendRuntime.
4747
properties:
48-
args:
48+
argFlags:
4949
description: |-
50-
Args represents the arguments appended to the backend.
51-
You can add new args or overwrite the default args.
50+
ArgFlags represents the argument flags appended to the backend.
51+
You can add new flags or overwrite the default flags.
5252
items:
5353
type: string
5454
type: array
55+
argName:
56+
description: |-
57+
ArgName represents the argument name set in the backendRuntimeArg.
58+
If not set, will be derived by the model role, e.g. if one model's role
59+
is <draft>, the argName will be set to <speculative-decoding>. Better to
60+
set the argName explicitly.
61+
By default, the argName will be treated as <default> in runtime.
62+
type: string
5563
envs:
5664
description: Envs represents the environments set to the container.
5765
items:
@@ -214,6 +222,27 @@ spec:
214222
from the default version.
215223
type: string
216224
type: object
225+
elasticConfig:
226+
description: |-
227+
ElasticConfig defines the configuration for elastic usage,
228+
e.g. the max/min replicas. Default to 0 ~ Inf+.
229+
This requires to install the HPA first or will not work.
230+
properties:
231+
maxReplicas:
232+
description: |-
233+
MaxReplicas indicates the maximum number of inference workloads based on the traffic.
234+
Default to nil means there's no limit for the instance number.
235+
format: int32
236+
type: integer
237+
minReplicas:
238+
default: 1
239+
description: |-
240+
MinReplicas indicates the minimum number of inference workloads based on the traffic.
241+
Default to nil means we can scale down the instances to 1.
242+
If minReplicas set to 0, it requires to install serverless component at first.
243+
format: int32
244+
type: integer
245+
type: object
217246
modelClaim:
218247
description: |-
219248
ModelClaim represents claiming for one model, it's a simplified use case

chart/crds/service-crd.yaml

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -43,27 +43,6 @@ spec:
4343
Service controller will maintain multi-flavor of workloads with
4444
different accelerators for cost or performance considerations.
4545
properties:
46-
elasticConfig:
47-
description: |-
48-
ElasticConfig defines the configuration for elastic usage,
49-
e.g. the max/min replicas. Default to 0 ~ Inf+.
50-
This requires to install the HPA first or will not work.
51-
properties:
52-
maxReplicas:
53-
description: |-
54-
MaxReplicas indicates the maximum number of inference workloads based on the traffic.
55-
Default to nil means there's no limit for the instance number.
56-
format: int32
57-
type: integer
58-
minReplicas:
59-
default: 1
60-
description: |-
61-
MinReplicas indicates the minimum number of inference workloads based on the traffic.
62-
Default to nil means we can scale down the instances to 1.
63-
If minReplicas set to 0, it requires to install serverless component at first.
64-
format: int32
65-
type: integer
66-
type: object
6746
modelClaims:
6847
description: ModelClaims represents multiple claims for different
6948
models.

chart/templates/serviceaccount.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,5 @@ metadata:
77
app.kubernetes.io/created-by: llmaz
88
app.kubernetes.io/part-of: llmaz
99
{{- include "chart.labels" . | nindent 4 }}
10-
{{- if .Values.controllerManager.serviceAccount.annotations }}
1110
annotations:
12-
{{- toYaml .Values.controllerManager.serviceAccount.annotations | nindent 4 }}
13-
{{- end }}
11+
{{- toYaml .Values.controllerManager.serviceAccount.annotations | nindent 4 }}

chart/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ controllerManager:
3333
- ALL
3434
image:
3535
repository: inftyai/llmaz
36-
tag: v0.0.8
36+
tag: v0.0.9
3737
resources:
3838
limits:
3939
cpu: 500m

config/manager/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ kind: Kustomization
55
images:
66
- name: controller
77
newName: inftyai/llmaz
8-
newTag: v0.0.8
8+
newTag: v0.0.9

docs/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
```cmd
1313
helm repo add inftyai https://inftyai.github.io/llmaz
1414
helm repo update
15-
helm install llmaz inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.4
15+
helm install llmaz inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.5
1616
```
1717

1818
### Uninstall

index.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
apiVersion: v1
22
entries:
33
llmaz:
4+
- apiVersion: v2
5+
appVersion: 0.0.9
6+
created: "2025-01-06T19:30:25.471004+08:00"
7+
description: A Helm chart for llmaz
8+
digest: 4a36c5c0da481828e9682afb2932a96d74c7eb1dc9e4b9ceac42789520602d01
9+
name: llmaz
10+
type: application
11+
urls:
12+
- https://inftyai.github.io/llmaz/llmaz-0.0.5.tgz
13+
version: 0.0.5
414
- apiVersion: v2
515
appVersion: 0.0.8
616
created: "2024-10-23T16:25:18.126844+08:00"
@@ -41,4 +51,4 @@ entries:
4151
urls:
4252
- https://inftyai.github.io/llmaz/llmaz-0.0.1.tgz
4353
version: 0.0.1
44-
generated: "2024-10-23T16:25:18.101337+08:00"
54+
generated: "2025-01-06T19:30:25.435128+08:00"

0 commit comments

Comments
 (0)