bitseek - Flash Multi Head Latent Attention inspired by deepseek. #234

chetanreddyv · 2025-04-27T16:41:30Z

chetanreddyv
Apr 27, 2025

Features

1.58-bit weight quantization (ternary values)
8-bit activation quantization
Flash Multi-head Latent Attention with causal masking
Rotary Position Embeddings (RoPE)
SwiGLU activation in FFN layers
SubLN normalization
No bias terms in linear or normalization layers
Llama 2 tokenizer integration

Architecture Overview

Sequence Length: 2048
Model Size: ~300M parameters
Key Components:
- BitLinear layers for efficient weight and activation quantization
- Rotary Position Embeddings (RoPE) for positional encoding
- SwiGLU activation in Feed-Forward Network (FFN) layers
- SubLN (Sub-layer Normalization) normalization
- No bias terms in linear or normalization layers
- Flash Multi-head latent attention inspired by DeepSeek

I have the code for this implementation but no compute, I want to contribute this code here as open source.

BradKML · 2025-06-08T08:51:37Z

BradKML
Jun 8, 2025

Would this also apply to models like Qwen/Dots with open weights? Also, why not do "smart" PTQ with bitnet? Then again there needs to be a sharable library for this https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base

0 replies

sprappcom · 2025-06-17T17:12:01Z

sprappcom
Jun 17, 2025

8b version when out? i think it'll be great with 8b active parameters too. any etas on expectation?
i really hope for an bitnet 1.58 8b with 4096 context and i'll be really happy to use it for conversation ai and general filtering of spam content

1 reply

BradKML Jun 18, 2025

MS BitNet is likely just proof-of-concept, not something they will just remake, since its likely too valuable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bitseek - Flash Multi Head Latent Attention inspired by deepseek. #234

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

bitseek - Flash Multi Head Latent Attention inspired by deepseek. #234

Uh oh!

chetanreddyv Apr 27, 2025

Features

Architecture Overview

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

BradKML Jun 8, 2025

Uh oh!

sprappcom Jun 17, 2025

Uh oh!

BradKML Jun 18, 2025

chetanreddyv
Apr 27, 2025

Replies: 2 comments 1 reply

BradKML
Jun 8, 2025

sprappcom
Jun 17, 2025