Open
Description
The bos_token_id doesn't match between the model config and its tokenizer. It happens on those using Qwen as the base model. Opened an discussion here: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/discussions/25
It may not fit on device w/o quantization, but exporting the llama based deepseek-R1 to ExecuTorch works just fine, e.g. setting model_id to deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity