You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+82-20Lines changed: 82 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,14 @@ Optimum ExecuTorch enables efficient deployment of transformer models using Meta
20
20
21
21
## ⚡ Quick Installation
22
22
23
-
Install from source:
23
+
### 1. Create a virtual environment
24
+
Install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) on your machine. Then, create a virtual environment to manage our dependencies.
prompt="Simply put, the theory of relativity states that",
88
+
max_seq_len=128
89
+
)
90
+
print(generated_text)
91
+
```
92
+
93
+
> **Note:** If an ExecuTorch model is already cached on the Hugging Face Hub, the API will automatically skip the export step and load the cached `.pte` file. To test this, replace the `model_id` in the example above with `"executorch-community/SmolLM2-135M"`, where the `.pte` file is pre-cached. Additionally, the `.pte` file can be directly associated with the eager model, as demonstrated in this [example](https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch).
94
+
95
+
96
+
### Option 2: Export and Load Separately
37
97
38
98
#### Step 1: Export your model
39
99
Use the CLI tool to convert your model to ExecuTorch format:
prompt="Simply put, the theory of relativity states that",
62
122
max_seq_len=128
63
123
)
124
+
print(generated_text)
64
125
```
65
126
66
-
### Option 2: Python API
67
-
```python
68
-
from optimum.executorch import ExecuTorchModelForCausalLM
69
-
from transformers import AutoTokenizer
127
+
## Supported Models and Backend
70
128
71
-
# Load and export the model on-the-fly
72
-
model_id ="meta-llama/Llama-3.2-1B"
73
-
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe="xnnpack")
129
+
**Optimum-ExecuTorch** currently supports the following transformer models:
130
+
131
+
-[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) and its variants
132
+
-[HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) and its variants
133
+
-[Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) and its variants
134
+
-[deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and its variants
135
+
-[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) and its variants
136
+
-[allenai/OLMo-1B-hf](https://huggingface.co/allenai/OLMo-1B-hf) and its variants
137
+
138
+
*Note: This list is continuously expanding. As we continue to expand support, more models and variants will be added.*
139
+
140
+
**Supported Backend:**
141
+
142
+
Currently, **Optimum-ExecuTorch** supports only the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) for efficient CPU execution on mobile devices. Quantization support for XNNPACK is planned to be added shortly.
143
+
144
+
For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).
0 commit comments