You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prompt="Simply put, the theory of relativity states that",
127
-
max_seq_len=32
129
+
prompt="Once upon a time",
130
+
max_seq_len=128
128
131
)
129
132
print(generated_text)
130
133
```
131
134
135
+
## Supported Optimizations
136
+
137
+
### Custom Operators
138
+
Supported using [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) with Hugging Face Transformers, boosting performance by 3x compared to default SDPA, based on tests with `HuggingFaceTB/SmolLM2-135M`.
139
+
140
+
### Backends Delegation
141
+
Currently, **Optimum-ExecuTorch** supports the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) with [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) for efficient execution on mobile CPUs.
142
+
143
+
For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).
144
+
145
+
### Quantization
146
+
We currently support Post-Training Quantization (PTQ) for linear layers using int8 dynamic per-token activations and int4 grouped per-channel weights (aka `8da4w`), as well as int8 channelwise embedding quantization.
147
+
148
+
🚀 Stay tuned as more optimizations and performance enhancements are coming soon!
149
+
150
+
132
151
## Supported Models
133
152
134
-
**Optimum-ExecuTorch** currently supports the following transformer models:
153
+
The following models have been successfully tested with Executorch. For details on the specific optimizations supported and how to use them for each model, please consult their respective test files in the [`tests/models/`](https://github.com/huggingface/optimum-executorch/tree/main/tests/models) directory.
135
154
136
155
### Text Models
137
156
We currently support a wide range of popular transformer models, including encoder-only, decoder-only, and encoder-decoder architectures, as well as models specialized for various tasks like text generation, translation, summarization, and mask prediction, etc. These models reflect the current trends and popularity across the Hugging Face community:
@@ -173,22 +192,6 @@ We currently support a wide range of popular transformer models, including encod
173
192
*📌 Note: This list is continuously expanding. As we continue to expand support, more models will be added.*
174
193
175
194
176
-
## Supported Optimizations
177
-
178
-
### Custom Operators
179
-
Supported using [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) with Hugging Face Transformers, boosting performance by 3x compared to default SDPA, based on tests with `HuggingFaceTB/SmolLM2-135M`.
180
-
181
-
### Backends Delegation
182
-
Currently, **Optimum-ExecuTorch** supports the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) with [custom SDPA](https://github.com/pytorch/executorch/blob/a4322c71c3a97e79e0454a8223db214b010f1193/extension/llm/README.md?plain=1#L40) for efficient execution on mobile CPUs.
183
-
184
-
For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).
185
-
186
-
### Quantization
187
-
We currently support Post-Training Quantization (PTQ) for linear layers using int8 dynamic per-token activations and int4 grouped per-channel weights (aka `8da4w`), as well as int8 channelwise embedding quantization.
188
-
189
-
🚀 Stay tuned as more optimizations and performance enhancements are coming soon!
0 commit comments