Skip to content

Commit 31609a1

Browse files
guangy10Guang Yang
andauthored
Add pointers to use supported optimizations for each model (#66)
Co-authored-by: Guang Yang <[email protected]>
1 parent efecfc5 commit 31609a1

File tree

1 file changed

+27
-24
lines changed

1 file changed

+27
-24
lines changed

README.md

Lines changed: 27 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -77,18 +77,19 @@ from optimum.executorch import ExecuTorchModelForCausalLM
7777
from transformers import AutoTokenizer
7878

7979
# Load and export the model on-the-fly
80-
model_id = "HuggingFaceTB/SmolLM2-135M"
80+
model_id = "HuggingFaceTB/SmolLM2-135M-Instruct"
8181
model = ExecuTorchModelForCausalLM.from_pretrained(
8282
model_id,
8383
recipe="xnnpack",
8484
attn_implementation="custom_sdpa", # Use custom SDPA implementation for better performance
85+
**{"qlinear": True}, # quantize linear layers with 8da4w
8586
)
8687

8788
# Generate text right away
8889
tokenizer = AutoTokenizer.from_pretrained(model_id)
8990
generated_text = model.text_generation(
9091
tokenizer=tokenizer,
91-
prompt="Simply put, the theory of relativity states that",
92+
prompt="Once upon a time",
9293
max_seq_len=32,
9394
)
9495
print(generated_text)
@@ -103,12 +104,14 @@ print(generated_text)
103104
Use the CLI tool to convert your model to ExecuTorch format:
104105
```
105106
optimum-cli export executorch \
106-
--model "HuggingFaceTB/SmolLM2-135M" \
107+
--model "HuggingFaceTB/SmolLM2-135M-Instruct" \
107108
--task "text-generation" \
108109
--recipe "xnnpack" \
109110
--output_dir="hf_smollm2" \
110-
--use_custom_sdpa
111+
--use_custom_sdpa \
112+
--qlinear
111113
```
114+
Explore the various export options by running the command: `optimum-cli export executorch --help`
112115

113116
#### Step 2: Load and run inference
114117
Use the exported model for text generation:
@@ -120,18 +123,34 @@ from transformers import AutoTokenizer
120123
model = ExecuTorchModelForCausalLM.from_pretrained("./hf_smollm2")
121124

122125
# Initialize tokenizer and generate text
123-
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M")
126+
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
124127
generated_text = model.text_generation(
125128
tokenizer=tokenizer,
126-
prompt="Simply put, the theory of relativity states that",
127-
max_seq_len=32
129+
prompt="Once upon a time",
130+
max_seq_len=128
128131
)
129132
print(generated_text)
130133
```
131134

135+
## Supported Optimizations
136+
137+
### Custom Operators
138+
Supported using [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) with Hugging Face Transformers, boosting performance by 3x compared to default SDPA, based on tests with `HuggingFaceTB/SmolLM2-135M`.
139+
140+
### Backends Delegation
141+
Currently, **Optimum-ExecuTorch** supports the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) with [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) for efficient execution on mobile CPUs.
142+
143+
For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).
144+
145+
### Quantization
146+
We currently support Post-Training Quantization (PTQ) for linear layers using int8 dynamic per-token activations and int4 grouped per-channel weights (aka `8da4w`), as well as int8 channelwise embedding quantization.
147+
148+
🚀 Stay tuned as more optimizations and performance enhancements are coming soon!
149+
150+
132151
## Supported Models
133152

134-
**Optimum-ExecuTorch** currently supports the following transformer models:
153+
The following models have been successfully tested with Executorch. For details on the specific optimizations supported and how to use them for each model, please consult their respective test files in the [`tests/models/`](https://github.com/huggingface/optimum-executorch/tree/main/tests/models) directory.
135154

136155
### Text Models
137156
We currently support a wide range of popular transformer models, including encoder-only, decoder-only, and encoder-decoder architectures, as well as models specialized for various tasks like text generation, translation, summarization, and mask prediction, etc. These models reflect the current trends and popularity across the Hugging Face community:
@@ -173,22 +192,6 @@ We currently support a wide range of popular transformer models, including encod
173192
*📌 Note: This list is continuously expanding. As we continue to expand support, more models will be added.*
174193

175194

176-
## Supported Optimizations
177-
178-
### Custom Operators
179-
Supported using [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) with Hugging Face Transformers, boosting performance by 3x compared to default SDPA, based on tests with `HuggingFaceTB/SmolLM2-135M`.
180-
181-
### Backends Delegation
182-
Currently, **Optimum-ExecuTorch** supports the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) with [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) for efficient execution on mobile CPUs.
183-
184-
For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).
185-
186-
### Quantization
187-
We currently support Post-Training Quantization (PTQ) for linear layers using int8 dynamic per-token activations and int4 grouped per-channel weights (aka `8da4w`), as well as int8 channelwise embedding quantization.
188-
189-
🚀 Stay tuned as more optimizations and performance enhancements are coming soon!
190-
191-
192195
## 🛠️ Advanced Usage
193196

194197
Check our [ExecuTorch GitHub repo](https://github.com/pytorch/executorch) directly for:

0 commit comments

Comments
 (0)