@@ -20,7 +20,14 @@ Optimum ExecuTorch enables efficient deployment of transformer models using Meta
20
20
21
21
## ⚡ Quick Installation
22
22
23
- Install from source:
23
+ ### 1. Create a virtual environment:
24
+ Install [ conda] ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html ) on your machine. Then, create a virtual environment to manage our dependencies.
25
+ ```
26
+ conda create -n optimum-executorch python=3.11
27
+ conda activate optimum-executorch
28
+ ```
29
+
30
+ ### 2. Install optimum-executorch from source:
24
31
```
25
32
git clone https://github.com/huggingface/optimum-executorch.git
26
33
cd optimum-executorch
@@ -29,11 +36,67 @@ pip install .
29
36
30
37
- 🔜 Install from pypi coming soon...
31
38
39
+ ### [ Optional] 3. Install dependencies in dev mode
40
+ You can install ` executorch ` and ` transformers ` from source, where you can access new ExecuTorch
41
+ compatilbe models from ` transformers ` and new features from ` executorch ` as both repos are under
42
+ rapid deployment.
43
+
44
+ Follow these steps manually:
45
+
46
+ #### 3.1. Clone and Install ExecuTorch from Source:
47
+ From the root directory where ` optimum-executorch ` is cloned:
48
+ ```
49
+ # Clone the ExecuTorch repository
50
+ git clone https://github.com/pytorch/executorch.git
51
+ cd executorch
52
+ # Checkout the stable branch to ensure stability
53
+ git checkout viable/strict
54
+ # Install ExecuTorch
55
+ bash ./install_executorch.sh
56
+ cd ..
57
+ ```
58
+
59
+ #### 3.2. Clone and Install Transformers from Source
60
+ From the root directory where ` optimum-executorch ` is cloned:
61
+ ```
62
+ # Clone the Transformers repository
63
+ git clone https://github.com/huggingface/transformers.git
64
+ cd transformers
65
+ # Install Transformers in editable mode
66
+ pip install -e .
67
+ cd ..
68
+ ```
69
+
32
70
## 🎯 Quick Start
33
71
34
72
There are two ways to use Optimum ExecuTorch:
35
73
36
- ### Option 1: Export and Load Separately
74
+ ### Option 1: Export and Load in One Python API
75
+ ``` python
76
+ from optimum.executorch import ExecuTorchModelForCausalLM
77
+ from transformers import AutoTokenizer
78
+
79
+ # Load and export the model on-the-fly
80
+ model_id = " meta-llama/Llama-3.2-1B"
81
+ model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe = " xnnpack" )
82
+
83
+ # Generate text right away
84
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
85
+ generated_text = model.text_generation(
86
+ tokenizer = tokenizer,
87
+ prompt = " Simply put, the theory of relativity states that" ,
88
+ max_seq_len = 128
89
+ )
90
+ print (generated_text)
91
+ ```
92
+
93
+ If an ExecuTorch model is already cached on the Hugging Face Hub, the API will
94
+ automatically skip the export step and load the cached .pte file. To try out this,
95
+ replace the ` model_id ` in the example above with "executorch-community/SmolLM2-135M",
96
+ where the ` .pte ` file is already cached. Note that the ` .pte ` file can be directly
97
+ linked to the eager model, as shown in this [ example] ( https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch ) .
98
+
99
+ ### Option 2: Export and Load Separately
37
100
38
101
#### Step 1: Export your model
39
102
Use the CLI tool to convert your model to ExecuTorch format:
@@ -61,33 +124,34 @@ generated_text = model.text_generation(
61
124
prompt = " Simply put, the theory of relativity states that" ,
62
125
max_seq_len = 128
63
126
)
127
+ print (generated_text)
64
128
```
65
129
66
- ### Option 2: Python API
67
- ``` python
68
- from optimum.executorch import ExecuTorchModelForCausalLM
69
- from transformers import AutoTokenizer
130
+ ## Supported Models
70
131
71
- # Load and export the model on-the-fly
72
- model_id = " meta-llama/Llama-3.2-1B"
73
- model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe = " xnnpack" )
132
+ Optimum ExecuTorch currently supports the following transformer models:
133
+
134
+ - [ meta-llama/Llama-3.2-1B] ( https://huggingface.co/meta-llama/Llama-3.2-1B ) (and its variants)
135
+ - [ HuggingFaceTB/SmolLM2-135M] ( https://huggingface.co/HuggingFaceTB/SmolLM2-135M ) (and its variants)
136
+ - [ Qwen/Qwen2.5-0.5B] ( https://huggingface.co/Qwen/Qwen2.5-0.5B ) (and its variants)
137
+ - [ deepseek-ai/DeepSeek-R1-Distill-Llama-8B] ( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B ) (and its variants)
138
+ - [ google/gemma-2-2b] ( https://huggingface.co/google/gemma-2-2b ) (and its variants)
139
+ - [ allenai/OLMo-1B-hf] ( https://huggingface.co/allenai/OLMo-1B-hf ) (and its variants)
140
+
141
+ * Note: This list is continuously expanding. As we continue to expand support, more models and variants will be added.*
142
+
143
+
144
+ ## Supported Recipes
145
+
146
+ Optimum ExecuTorch currently only supports [ ` XNNPACK ` Backend] ( https://pytorch.org/executorch/main/backends-xnnpack.html ) .
74
147
75
- # Generate text right away
76
- tokenizer = AutoTokenizer.from_pretrained(model_id)
77
- generated_text = model.text_generation(
78
- tokenizer = tokenizer,
79
- prompt = " Simply put, the theory of relativity states that" ,
80
- max_seq_len = 128
81
- )
82
- ```
83
148
84
149
## 🛠️ Advanced Usage
85
150
86
151
Check our [ ExecuTorch GitHub repo] ( https://github.com/pytorch/executorch ) directly for:
87
- - Custom model export configurations
88
- - Performance optimization guides
152
+ - More backends and performance optimization options
89
153
- Deployment guides for Android, iOS, and embedded devices
90
- - Additional examples
154
+ - Additional examples and benchmarks
91
155
92
156
## 🤝 Contributing
93
157
0 commit comments