Skip to content

Improve setup guide #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 82 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,14 @@ Optimum ExecuTorch enables efficient deployment of transformer models using Meta

## ⚡ Quick Installation

Install from source:
### 1. Create a virtual environment
Install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) on your machine. Then, create a virtual environment to manage our dependencies.
```
conda create -n optimum-executorch python=3.11
conda activate optimum-executorch
```

### 2. Install optimum-executorch from source
```
git clone https://github.com/huggingface/optimum-executorch.git
cd optimum-executorch
Expand All @@ -29,11 +36,64 @@ pip install .

- 🔜 Install from pypi coming soon...

### [Optional] 3. Install dependencies in dev mode
You can install `executorch` and `transformers` from source, where you can access new ExecuTorch
compatilbe models from `transformers` and new features from `executorch` as both repos are under
rapid deployment.

Follow these steps manually:

#### 3.1. Clone and Install ExecuTorch from Source
From the root directory where `optimum-executorch` is cloned:
```
# Clone the ExecuTorch repository
git clone https://github.com/pytorch/executorch.git
cd executorch
# Checkout the stable branch to ensure stability
git checkout viable/strict
# Install ExecuTorch
bash ./install_executorch.sh
cd ..
```

#### 3.2. Clone and Install Transformers from Source
From the root directory where `optimum-executorch` is cloned:
```
# Clone the Transformers repository
git clone https://github.com/huggingface/transformers.git
cd transformers
# Install Transformers in editable mode
pip install -e .
cd ..
```

## 🎯 Quick Start

There are two ways to use Optimum ExecuTorch:

### Option 1: Export and Load Separately
### Option 1: Export and Load in One Python API
```python
from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer

# Load and export the model on-the-fly
model_id = "meta-llama/Llama-3.2-1B"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe="xnnpack")

# Generate text right away
tokenizer = AutoTokenizer.from_pretrained(model_id)
generated_text = model.text_generation(
tokenizer=tokenizer,
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
print(generated_text)
```

> **Note:** If an ExecuTorch model is already cached on the Hugging Face Hub, the API will automatically skip the export step and load the cached `.pte` file. To test this, replace the `model_id` in the example above with `"executorch-community/SmolLM2-135M"`, where the `.pte` file is pre-cached. Additionally, the `.pte` file can be directly associated with the eager model, as demonstrated in this [example](https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch).


### Option 2: Export and Load Separately

#### Step 1: Export your model
Use the CLI tool to convert your model to ExecuTorch format:
Expand Down Expand Up @@ -61,33 +121,35 @@ generated_text = model.text_generation(
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
print(generated_text)
```

### Option 2: Python API
```python
from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer
## Supported Models and Backend

# Load and export the model on-the-fly
model_id = "meta-llama/Llama-3.2-1B"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe="xnnpack")
**Optimum-ExecuTorch** currently supports the following transformer models:

- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) and its variants
- [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) and its variants
- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) and its variants
- [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and its variants
- [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) and its variants
- [allenai/OLMo-1B-hf](https://huggingface.co/allenai/OLMo-1B-hf) and its variants

*Note: This list is continuously expanding. As we continue to expand support, more models and variants will be added.*

**Supported Backend:**

Currently, **Optimum-ExecuTorch** supports only the [XNNPACK Backend](https://pytorch.org/executorch/main/backends-xnnpack.html) for efficient CPU execution on mobile devices. Quantization support for XNNPACK is planned to be added shortly.

For a comprehensive overview of all backends supported by ExecuTorch, please refer to the [ExecuTorch Backend Overview](https://pytorch.org/executorch/main/backends-overview.html).

# Generate text right away
tokenizer = AutoTokenizer.from_pretrained(model_id)
generated_text = model.text_generation(
tokenizer=tokenizer,
prompt="Simply put, the theory of relativity states that",
max_seq_len=128
)
```

## 🛠️ Advanced Usage

Check our [ExecuTorch GitHub repo](https://github.com/pytorch/executorch) directly for:
- Custom model export configurations
- Performance optimization guides
- More backends and performance optimization options
- Deployment guides for Android, iOS, and embedded devices
- Additional examples
- Additional examples and benchmarks

## 🤝 Contributing

Expand Down
13 changes: 7 additions & 6 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,19 @@
assert False, "Error: Could not open '%s' due %s\n" % (filepath, error)

INSTALL_REQUIRE = [
"optimum~=1.24",
"accelerate>=0.26.0",
"datasets",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need accelerate and datasets as required dependencies ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix Yeah, required when running certain models, as reported by users here #29 (comment)

I want to simplify the UX so that users can have most common deps installed by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of keeping the number of required dependencies as low as possible (only keeping what's necessary). Could you extend on why datasets needs to be added? Also for accelerate I don't think this is mandatory (we could for example check if accelerate is available and depending on it how to set low_cpu_mem_usage when loading the model) https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/modeling_utils.py#L3612 wdyt ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to block this PR so reverted in 7768d35, would you mind opening a new PR for this so that we merge this PR asap and continue the discussion there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @michaelbenayoun who will likely review the second PR

"executorch>=0.4.0",
"optimum~=1.24",
"safetensors",
"sentencepiece",
"tiktoken",
"transformers>=4.46",
]

TESTS_REQUIRE = [
"accelerate>=0.26.0",
"pytest",
"parameterized",
"sentencepiece",
"datasets",
"safetensors",
"pytest",
]


Expand Down
Loading