Release Release v0.12.0 · Xilinx/brevitas

Breaking Changes

TruncIntQuant, TruncAvgPool, Trunc QONNX Op changes #1042

Highlights

New PTQ algorithms:
- AWQ #1213
- MagR #1214
- QuaRot #1061
- SpinQuant #1155
- AutoRound #1064
- SVDQuant #1210
New datatype support
- Hierarchical scales #1038
Initial torch.compile support #1206
- User guide here
YAML-based experiments #1116
Benchmarking scripts for LLM example #1166
New operator support
- Better SDPA quantization support #1090

What's Changed

Feat (examples/generative): block-based optimization for GPTQ by @Giuseppe5 in #1046
Fix (learned_round): disable return QuantTensor during float inference by @pablomlago in #1059
Bump onnx from 1.15 to 1.17.0 in /requirements by @dependabot in #1069
Fix (minifloat): correct minifloat computation and tests by @Giuseppe5 in #1067
Feat (ptq): adding accumulator-aware extensions to GPxQ by @i-colbert in #1060
Feat: add contributing guidelines by @Giuseppe5 in #1075
Feat (float): adding new attributes to proxy and quant tensor by @i-colbert in #1072
Feat (accelerate): improved accelerate compatibility by @Giuseppe5 in #1065
Fix Transformers tests by @Giuseppe5 in #1081
Fix (data): updating wikitext2 data utility by @i-colbert in #1080
Fix (groupwise): correct log, groupdim, and scale computation by @Giuseppe5 in #1071
Test (mx): add reference impl for MXFloat by @Giuseppe5 in #1068
Fix (examples/generative): Fixed argument order for quantize_model by @nickfraser in #1084
Feat (export): qonnx minifloat export by @Giuseppe5 in #1070
Feat (core): use runtime parameter for scale by @Giuseppe5 in #1037
Fix (per_group): fixing the per_group sym quantizer by @i-colbert in #1089
Rotation based equalization by @Giuseppe5 in #1061
Fix (examples/llm): fix for main and README by @Giuseppe5 in #1092
Fix: correct output scale compute by @Giuseppe5 in #1077
Fix (ptq/rotation): fix for rotation implementation (#1095) by @Giuseppe5 in #1095
Fix (scaling)!: clamp to avoid inf/nan in forward/backward by @Giuseppe5 in #1097
Setup: bump python & torch version by @Giuseppe5 in #1098
Feat: Per-Row po2 float ocp by @Giuseppe5 in #1102
Fix LLM tests by @pablomlago in #1088
Feat (brevitas_examples/llm): remove dependencies from optimum-amd by @Giuseppe5 in #1094
Feat auto round by @pablomlago in #1064
Fix (hadamard): remove hadamard loading warning by @Giuseppe5 in #1108
Hierarchical scales by @Giuseppe5 in #1038
Improvements to learned round by @Giuseppe5 in #1107
Feat (brevitas_examples/llm): update README by @Giuseppe5 in #1109
Fix (gpxq): tensor unpacking and Cholesky stabilization by @i-colbert in #1111
Feat (llm): adding more quantizers by @i-colbert in #1113
Feat (llm/learned_round): fast block update by @Giuseppe5 in #1110
Fix SignSGD docstring by @pablomlago in #1115
Feat (nn/sdpa): quantization of scaled dot-product attention by @nickfraser in #1090
Fix (brevitas_examples/llm): scaling_min_val for fp32 by @Giuseppe5 in #1117
Feat (scaling): no tracked_parameter_list with individual quantizer by @Giuseppe5 in #1112
Feat (brevitas_examples/llm): select act_eq alpha by @Giuseppe5 in #1121
Fix llm tests transformers by @pablomlago in #1118
Fix (float/clamp): Bugfix when unsigned by @nickfraser in #1132
Feat (brevitas_examples/llm): inference_mode support by @Giuseppe5 in #1129
Feat (brevitas_examples/llm): correct scale init with CPU offloading by @Giuseppe5 in #1124
Feat (brevitas_examples/sdxl): inference_mode + compile by @Giuseppe5 in #1133
Feat (proxy): flag to enable/disable QT return by @Giuseppe5 in #1083
Feat (examples/llm): Specify experiments via YAML files by @nickfraser in #1116
test (core/float): Enhanced testing of minifloat formats by @nickfraser in #1136
Eval harness by @Giuseppe5 in #1131
Fix: pytree warning by @i-colbert in #1144
Fix LLM entry point by @i-colbert in #1145
Fix (scaling/standalone): better switch from runtime stats to param by @Giuseppe5 in #1099
Fix (proxy): fix groupwise scale/zp caching by @Giuseppe5 in #1137
Fix (export/inference_mode): correct rounding function by @Giuseppe5 in #1146
Setup: pin transformers version by @Giuseppe5 in #1150
Feat (mx): unpadding during dequantization by @Giuseppe5 in #1134
Feat (brevitas_examples/llm): load from checkpoint by @Giuseppe5 in #1151
Feat (rotation): equalize across SDPA by @Giuseppe5 in #1149
Feat (quantization): torch_function based quantization by @Giuseppe5 in #1147
Setup: bump torch version for LLM tests by @Giuseppe5 in #1154
Feat (equalize): enable parametrized rotations by @pablomlago in #1148
Feat (optim): add Cailey SGD optimizer by @pablomlago in #1153
Setup: update pre-commit python version by @Giuseppe5 in #1158
Fix (brevitas_examples/llm): remove unecessary checkpointing by @Giuseppe5 in #1161
Feat (zero_point): dynamic groupwise zero point by @Giuseppe5 in #1160
New rotation by @Giuseppe5 in #1159
Fix (brevitas_examples/llm): equalized module + fx compatibility by @Giuseppe5 in #1164
Fix (runtime_act): fix negative group_dim handling by @Giuseppe5 in #1157
Fix (a2q): missing restrict_pre_scaling_impl definition by @Giuseppe5 in #1167
Feat (equalize): enable rotation matrix optimization by @pablomlago in #1155
Add FP16 support to ptq_evaluate.py and update README argument list by @hkayann in #1174
Feat (brevitas_examples/llm): separate KV Cache quantization by @Giuseppe5 in #1165
Feat (hadamard): support region expansion by @Giuseppe5 in #1178
Feat (llm): benchmark for llm entrypoint by @pablomlago in #1166
fix (docs/faq): remove reference to gitter, switch affine quantization to be an example by @nickfraser in #1183
Fix (brevitas_examples/sdxl): correct import for inference_mode by @Giuseppe5 in #1185
Feat (gpfq): optimizing with lower diagonal matrix formulation by @i-colbert in #1172
Feat (brevitas_examples/llm): better dtype selection by @Giuseppe5 in #1186
Fix (brevitas_examples/sdxl): faster sdxl inference by @Giuseppe5 in #1188
fix (examples/benchmark): Fix when run_results.yaml does not exist. by @nickfraser in #1189
Feat (example/common): Added groupwise, float scaled OCP option by @nickfraser in #1190
Fix (examples/llm): default dtype from None to float16 by @pablomlago in #1191
Fix (utils/torch_utils): ensure gradient propagation through pad_to_dim by @pablomlago in #1194
Fix (examples/llm): prevent layernorm_to_rmsnorm option when fused_no_fx by @pablomlago in #1192
Feat (brevitas_examples/sdxl): update mlperf by @Giuseppe5 in #1195
Feat (brevitas_examples/llm): support for lighteval by @Giuseppe5 in #1162
Fix (optim/cailey_sgd): fix cailey sgd in float16/bfloat16 by @pablomlago in #1193
Feat (brevitas_examples/stable_diffusion): VAE quantization support by @Giuseppe5 in #1197
Fix (quant_tensors): remove duplication by @Giuseppe5 in #1204
Fix (brevitas_examples/llm): support MSE with offloaded models by @Giuseppe5 in #1196
Fix (quant): improvements to quantization by @Giuseppe5 in #1207
Fix (export/inference_mode): correct handler for dynamic float quant by @Giuseppe5 in #1208
Feat: Initial SVDQuant support by @nickfraser in #1210
Feat (equalize): enable parametrized scales by @pablomlago in #1175
Fix (llm/equalize): remove call to _update_weights by @pablomlago in #1216
Local compile support by @Giuseppe5 in #1206
Setup: pin onnxruntime by @Giuseppe5 in #1218
Fix (quant): clean-up to quantization code by @Giuseppe5 in #1219
Fix (equalize): dtype fix in activation equalization by @pablomlago in #1217
Feat (example/benchmark): Added script to convert YAML cfgs to "benchmark" configs by @nickfraser in #1184
Support for transformer-based diffusion network by @Giuseppe5 in #1211
Fix (brevitas_examples/llm): remove deprecated flag by @Giuseppe5 in #1225
Fix (ex/llm): add missing copyright header by @nickfraser in #1227
Feat (compile): limit activation recompiles by @Giuseppe5 in #1222
Fix (ex/llm): Added defaults for several arguments. by @nickfraser in #1238
Feat (compile): limit memory utilization with groupwise quantization by @Giuseppe5 in #1232
Feat (brevitas_examples/diffusion): flux attention quantization by @Giuseppe5 in #1221
Feat (brevitas_examples/llm): BOS preprocessing for calibration data by @Giuseppe5 in #1240
Fix (test/ste_ops): fix mock tests by @nickfraser in #1242
Fix (calibrate): correct zero_point init by @Giuseppe5 in #1243
Feat (examples/generative): add fnuz quantizers by @Giuseppe5 in #1244
Docs (readme): Update citation by @nickfraser in #1247
Feat (ex/benchmark): Add optional start/end indices by @nickfraser in #1248
Fix (ex/llm): Regenerate template configs by @nickfraser in #1249
Fix (gptq): Fix several edge cases by @nickfraser in #1252
Fix (brevitas_examples/diffusion): workaround for svdquant with SDXL by @Giuseppe5 in #1256
Setup: fix pre_commit CI by @Giuseppe5 in #1264
Feat (magr): initial implementation of MagR by @i-colbert in #1214
Fix/Feat (trunc avg pool): Update truncation and average pool behaviour by @nickfraser in #1042
Fix (ex/llm): Fix per-row quant_sdpa broadcastable shape by @nickfraser in #1254
feat (ex/benchmark): Added option to shuffle order of benchmark processes by @nickfraser in #1268
Fix (examples/llm): Fix PPLs by @pablomlago in #1271
Fix (data): bos_processing in pile dataset by @i-colbert in #1259
Feat (llm/eval): remove BOS token by @pablomlago in #1258
Fix (graph/hadamard): .view can fail with functional QuantSDPA by @nickfraser in #1270
Fix (scaling/float): correct dtype for threshold by @Giuseppe5 in #1265
Fix (runtime_quant): correct priority for act quant by @Giuseppe5 in #1255
Fix (quant_sdpa): remove print by @Giuseppe5 in #1273
Feat (graph/calibrate): refactor DisableEnableQuantization by @pablomlago in #1257
Fix (quant/float): input_view_impl for float_no_scale by @Giuseppe5 in #1260
Fix (ci): Don't update PyTorch version by @nickfraser in #1275
Feat (brevitas_examples/sdxl): better GPTQ by @Giuseppe5 in #1250
Feat (ex/llm): bos preprocessing by @pablomlago in #1277
test (ex/llm): Minor fixes to tests. Add rotation tests. by @nickfraser in #1253
Fix (graph/equalize): fix value-output region in SDPA by @Giuseppe5 in #1278
Feat (graph/calibrate): change quant_status_manager defaults to no-op by @pablomlago in #1274
Fix (core/function): Fix learned round when padding is applied to weights by @nickfraser in #1235
Fix (export/onnx): Improved ONNX export performance by @nickfraser in #1279
Feat (llm/awq): activation-aware weight scaling by @pablomlago in #1213
Docs: update / generate docs for 0.12.0 release by @nickfraser in #1284
Docs: regen notebooks and docs by @nickfraser in #1285

New Contributors

@dependabot made their first contribution in #1069
@hkayann made their first contribution in #1174

Full Changelog: v0.11.0...v0.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.12.0

Breaking Changes

Highlights

What's Changed

New Contributors

Contributors

Uh oh!