Skip to content

Improve setup guide #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 82 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,14 @@ Optimum ExecuTorch enables efficient deployment of transformer models using Meta

## ⚡ Quick Installation

Install from source:
### 1. Create a virtual environment
Install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) on your machine. Then, create a virtual environment to manage our dependencies.
```
conda create -n optimum-executorch python=3.11
conda activate optimum-executorch
```

### 2. Install optimum-executorch from source
```
git clone https://github.com/huggingface/optimum-executorch.git
cd optimum-executorch
Expand All @@ -29,11 +36,64 @@ pip install .

- 🔜 Install from pypi coming soon...

### [Optional] 3. Install dependencies in dev mode
You can install `executorch` and `transformers` from source, where you can access new ExecuTorch
compatilbe models from `transformers` and new features from `executorch` as both repos are under
rapid deployment.

Follow these steps manually:

#### 3.1. Clone and Install ExecuTorch from Source
From the root directory where `optimum-executorch` is cloned:
```
# Clone the ExecuTorch repository
git clone https://github.com/pytorch/executorch.git
cd executorch
# Checkout the stable branch to ensure stability
git checkout viable/strict
# Install ExecuTorch
bash ./install_executorch.sh
cd ..
```

#### 3.2. Clone and Install Transformers from Source
From the root directory where `optimum-executorch` is cloned:
```
# Clone the Transformers repository
git clone https://github.com/huggingface/transformers.git
cd transformers
# Install Transformers in editable mode
pip install -e .
cd ..
```

## 🎯 Quick Start

There are two ways to use Optimum ExecuTorch:

### Option 1: Export and Load Separately
### Option 1: Export and Load in One Python API
```python
from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer

# Load and export the model on-the-fly
model_id = "meta-llama/Llama-3.2-1B"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe="xnnpack")

# Generate text right away
tokenizer = AutoTokenizer.from_pretrained(model_id)
generated_text = model.text_generation(
tokenizer=tokenizer,
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
print(generated_text)
```

> **Note:** If an ExecuTorch model is already cached on the Hugging Face Hub, the API will automatically skip the export step and load the cached `.pte` file. To test this, replace the `model_id` in the example above with `"executorch-community/SmolLM2-135M"`, where the `.pte` file is pre-cached. Additionally, the `.pte` file can be directly associated with the eager model, as demonstrated in this [example](https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch).


### Option 2: Export and Load Separately

#### Step 1: Export your model
Use the CLI tool to convert your model to ExecuTorch format:
Expand Down Expand Up @@ -61,33 +121,35 @@ generated_text = model.text_generation(
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
print(generated_text)
```

### Option 2: Python API
```python
from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer
## Supported Models and Backend

# Load and export the model on-the-fly
model_id = "meta-llama/Llama-3.2-1B"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe="xnnpack")
**Optimum-ExecuTorch** currently supports the following transformer models:

- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) and its variants
- [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) and its variants
- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) and its variants
- [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and its variants
- [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) and its variants
- [allenai/OLMo-1B-hf](https://huggingface.co/allenai/OLMo-1B-hf) and its variants

*Note: This list is continuously expanding. As we continue to expand support, more models and variants will be added.*

**Supported Backend:**

Currently, **Optimum-ExecuTorch** supports only the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) for efficient CPU execution on mobile devices. Quantization support for XNNPACK is planned to be added shortly.

For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).

# Generate text right away
tokenizer = AutoTokenizer.from_pretrained(model_id)
generated_text = model.text_generation(
tokenizer=tokenizer,
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
```

## 🛠️ Advanced Usage

Check our [ExecuTorch GitHub repo](https://github.com/pytorch/executorch) directly for:
- Custom model export configurations
- Performance optimization guides
- More backends and performance optimization options
- Deployment guides for Android, iOS, and embedded devices
- Additional examples
- Additional examples and benchmarks

## 🤝 Contributing

Expand Down
Loading