Skip to content

Commit fd3f355

Browse files
sd983527Yan Xia
andauthored
update readme and setup script to support official BitNet b1.58 model (microsoft#171)
* update readme and setup file for new model. * update model file name --------- Co-authored-by: Yan Xia <[email protected]>
1 parent fa854cf commit fd3f355

File tree

3 files changed

+48
-9
lines changed

3 files changed

+48
-9
lines changed

README.md

Lines changed: 40 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
33
![version](https://img.shields.io/badge/version-1.0-blue)
44

5+
<img src="./assets/header_model_release.png" alt="BitNet Model on Hugging Face" width="800"/>
6+
57
bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support **fast** and **lossless** inference of 1.58-bit models on CPU (with NPU and GPU support coming next).
68

79
The first release of bitnet.cpp is to support inference on CPUs. bitnet.cpp achieves speedups of **1.37x** to **5.07x** on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by **55.4%** to **70.0%**, further boosting overall efficiency. On x86 CPUs, speedups range from **2.37x** to **6.17x** with energy reductions between **71.9%** to **82.2%**. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. Please refer to the [technical report](https://arxiv.org/abs/2410.16144) for more details.
@@ -18,7 +20,8 @@ A demo of bitnet.cpp running a BitNet b1.58 3B model on Apple M2:
1820
https://github.com/user-attachments/assets/7f46b736-edec-4828-b809-4be780a3e5b1
1921

2022
## What's New:
21-
- 02/18/2025 [Bitnet.cpp: Efficient Edge Inference for Ternary LLMs](https://arxiv.org/abs/2502.11880) ![NEW](https://img.shields.io/badge/NEW-red)
23+
- 04/14/2025 [BitNet Official 2B Parameter Model on Hugging Face](https://huggingface.co/microsoft/BitNet-b1.58-2B-4T) ![NEW](https://img.shields.io/badge/NEW-red)
24+
- 02/18/2025 [Bitnet.cpp: Efficient Edge Inference for Ternary LLMs](https://arxiv.org/abs/2502.11880)
2225
- 11/08/2024 [BitNet a4.8: 4-bit Activations for 1-bit LLMs](https://arxiv.org/abs/2411.04965)
2326
- 10/21/2024 [1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs](https://arxiv.org/abs/2410.16144)
2427
- 10/17/2024 bitnet.cpp 1.0 released.
@@ -29,9 +32,38 @@ https://github.com/user-attachments/assets/7f46b736-edec-4828-b809-4be780a3e5b1
2932
## Acknowledgements
3033

3134
This project is based on the [llama.cpp](https://github.com/ggerganov/llama.cpp) framework. We would like to thank all the authors for their contributions to the open-source community. Also, bitnet.cpp's kernels are built on top of the Lookup Table methodologies pioneered in [T-MAC](https://github.com/microsoft/T-MAC/). For inference of general low-bit LLMs beyond ternary models, we recommend using T-MAC.
35+
## Official Models
36+
<table>
37+
</tr>
38+
<tr>
39+
<th rowspan="2">Model</th>
40+
<th rowspan="2">Parameters</th>
41+
<th rowspan="2">CPU</th>
42+
<th colspan="3">Kernel</th>
43+
</tr>
44+
<tr>
45+
<th>I2_S</th>
46+
<th>TL1</th>
47+
<th>TL2</th>
48+
</tr>
49+
<tr>
50+
<td rowspan="2"><a href="https://huggingface.co/microsoft/BitNet-b1.58-2B-4T">BitNet-b1.58-2B-4T</a></td>
51+
<td rowspan="2">2.4B</td>
52+
<td>x86</td>
53+
<td>&#9989;</td>
54+
<td>&#10060;</td>
55+
<td>&#9989;</td>
56+
</tr>
57+
<tr>
58+
<td>ARM</td>
59+
<td>&#9989;</td>
60+
<td>&#9989;</td>
61+
<td>&#10060;</td>
62+
</tr>
63+
</table>
3264

3365
## Supported Models
34-
❗️**We use existing 1-bit LLMs available on [Hugging Face](https://huggingface.co/) to demonstrate the inference capabilities of bitnet.cpp. These models are neither trained nor released by Microsoft. We hope the release of bitnet.cpp will inspire the development of 1-bit LLMs in large-scale settings in terms of model size and training tokens.**
66+
❗️**We use existing 1-bit LLMs available on [Hugging Face](https://huggingface.co/) to demonstrate the inference capabilities of bitnet.cpp. We hope the release of bitnet.cpp will inspire the development of 1-bit LLMs in large-scale settings in terms of model size and training tokens.**
3567

3668
<table>
3769
</tr>
@@ -143,12 +175,13 @@ pip install -r requirements.txt
143175
```
144176
3. Build the project
145177
```bash
146-
# Download the model from Hugging Face, convert it to quantized gguf format, and build the project
178+
# Manually download the model and run with local path
179+
huggingface-cli download microsoft/BitNet-b1.58-2B-4T --local-dir models/BitNet-b1.58-2B-4T
180+
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
181+
182+
# Or you can download a model from Hugging Face, convert it to quantized gguf format, and build the project
147183
python setup_env.py --hf-repo tiiuae/Falcon3-7B-Instruct-1.58bit -q i2_s
148184

149-
# Or you can manually download the model and run with local path
150-
huggingface-cli download tiiuae/Falcon3-7B-Instruct-1.58bit --local-dir models/Falcon3-7B-Instruct-1.58bit
151-
python setup_env.py -md models/Falcon3-7B-Instruct-1.58bit -q i2_s
152185
```
153186
<pre>
154187
usage: setup_env.py [-h] [--hf-repo {1bitLLM/bitnet_b1_58-large,1bitLLM/bitnet_b1_58-3B,HF1BitLLM/Llama3-8B-1.58-100B-tokens,tiiuae/Falcon3-1B-Instruct-1.58bit,tiiuae/Falcon3-3B-Instruct-1.58bit,tiiuae/Falcon3-7B-Instruct-1.58bit,tiiuae/Falcon3-10B-Instruct-1.58bit}] [--model-dir MODEL_DIR] [--log-dir LOG_DIR] [--quant-type {i2_s,tl1}] [--quant-embd]
@@ -173,7 +206,7 @@ optional arguments:
173206
### Basic usage
174207
```bash
175208
# Run inference with the quantized model
176-
python run_inference.py -m models/Falcon3-7B-Instruct-1.58bit/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
209+
python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
177210
```
178211
<pre>
179212
usage: run_inference.py [-h] [-m MODEL] [-n N_PREDICT] -p PROMPT [-t THREADS] [-c CTX_SIZE] [-temp TEMPERATURE] [-cnv]
@@ -246,4 +279,3 @@ python utils/generate-dummy-bitnet-model.py models/bitnet_b1_58-large --outfile
246279
python utils/e2e_benchmark.py -m models/dummy-bitnet-125m.tl1.gguf -p 512 -n 128
247280
```
248281

249-

assets/header_model_release.png

14.5 KB
Loading

setup_env.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@
4141
"tiiuae/Falcon3-1B-Instruct-1.58bit": {
4242
"model_name": "Falcon3-1B-Instruct-1.58bit",
4343
},
44+
"microsoft/BitNet-b1.58-2B-4T": {
45+
"model_name": "BitNet-b1.58-2B-4T",
46+
},
4447
}
4548

4649
SUPPORTED_QUANT_TYPES = {
@@ -161,6 +164,8 @@ def gen_code():
161164
run_command([sys.executable, "utils/codegen_tl1.py", "--model", "Llama3-8B-1.58-100B-tokens", "--BM", "256,128,256,128", "--BK", "128,64,128,64", "--bm", "32,64,32,64"], log_step="codegen")
162165
elif get_model_name() == "bitnet_b1_58-3B":
163166
run_command([sys.executable, "utils/codegen_tl1.py", "--model", "bitnet_b1_58-3B", "--BM", "160,320,320", "--BK", "64,128,64", "--bm", "32,64,32"], log_step="codegen")
167+
elif get_model_name() == "BitNet-b1.58-2B-4T":
168+
run_command([sys.executable, "utils/codegen_tl1.py", "--model", "bitnet_b1_58-3B", "--BM", "160,320,320", "--BK", "64,128,64", "--bm", "32,64,32"], log_step="codegen")
164169
else:
165170
raise NotImplementedError()
166171
else:
@@ -177,6 +182,8 @@ def gen_code():
177182
run_command([sys.executable, "utils/codegen_tl2.py", "--model", "Llama3-8B-1.58-100B-tokens", "--BM", "256,128,256,128", "--BK", "96,96,96,96", "--bm", "32,32,32,32"], log_step="codegen")
178183
elif get_model_name() == "bitnet_b1_58-3B":
179184
run_command([sys.executable, "utils/codegen_tl2.py", "--model", "bitnet_b1_58-3B", "--BM", "160,320,320", "--BK", "96,96,96", "--bm", "32,32,32"], log_step="codegen")
185+
elif get_model_name() == "BitNet-b1.58-2B-4T":
186+
run_command([sys.executable, "utils/codegen_tl2.py", "--model", "bitnet_b1_58-3B", "--BM", "160,320,320", "--BK", "96,96,96", "--bm", "32,32,32"], log_step="codegen")
180187
else:
181188
raise NotImplementedError()
182189

@@ -222,4 +229,4 @@ def signal_handler(sig, frame):
222229
args = parse_args()
223230
Path(args.log_dir).mkdir(parents=True, exist_ok=True)
224231
logging.basicConfig(level=logging.INFO)
225-
main()
232+
main()

0 commit comments

Comments
 (0)