Breaking Changes
- TruncIntQuant, TruncAvgPool, Trunc QONNX Op changes #1042
Highlights
- New PTQ algorithms:
- New datatype support
- Hierarchical scales #1038
- Initial
torch.compile
support #1206- User guide here
- YAML-based experiments #1116
- Benchmarking scripts for LLM example #1166
- New operator support
- Better SDPA quantization support #1090
What's Changed
- Feat (examples/generative): block-based optimization for GPTQ by @Giuseppe5 in #1046
- Fix (learned_round): disable return QuantTensor during float inference by @pablomlago in #1059
- Bump onnx from 1.15 to 1.17.0 in /requirements by @dependabot in #1069
- Fix (minifloat): correct minifloat computation and tests by @Giuseppe5 in #1067
- Feat (ptq): adding accumulator-aware extensions to GPxQ by @i-colbert in #1060
- Feat: add contributing guidelines by @Giuseppe5 in #1075
- Feat (float): adding new attributes to proxy and quant tensor by @i-colbert in #1072
- Feat (accelerate): improved accelerate compatibility by @Giuseppe5 in #1065
- Fix Transformers tests by @Giuseppe5 in #1081
- Fix (data): updating wikitext2 data utility by @i-colbert in #1080
- Fix (groupwise): correct log, groupdim, and scale computation by @Giuseppe5 in #1071
- Test (mx): add reference impl for MXFloat by @Giuseppe5 in #1068
- Fix (examples/generative): Fixed argument order for
quantize_model
by @nickfraser in #1084 - Feat (export): qonnx minifloat export by @Giuseppe5 in #1070
- Feat (core): use runtime parameter for scale by @Giuseppe5 in #1037
- Fix (per_group): fixing the per_group sym quantizer by @i-colbert in #1089
- Rotation based equalization by @Giuseppe5 in #1061
- Fix (examples/llm): fix for main and README by @Giuseppe5 in #1092
- Fix: correct output scale compute by @Giuseppe5 in #1077
- Fix (ptq/rotation): fix for rotation implementation (#1095) by @Giuseppe5 in #1095
- Fix (scaling)!: clamp to avoid inf/nan in forward/backward by @Giuseppe5 in #1097
- Setup: bump python & torch version by @Giuseppe5 in #1098
- Feat: Per-Row po2 float ocp by @Giuseppe5 in #1102
- Fix LLM tests by @pablomlago in #1088
- Feat (brevitas_examples/llm): remove dependencies from optimum-amd by @Giuseppe5 in #1094
- Feat auto round by @pablomlago in #1064
- Fix (hadamard): remove hadamard loading warning by @Giuseppe5 in #1108
- Hierarchical scales by @Giuseppe5 in #1038
- Improvements to learned round by @Giuseppe5 in #1107
- Feat (brevitas_examples/llm): update README by @Giuseppe5 in #1109
- Fix (gpxq): tensor unpacking and Cholesky stabilization by @i-colbert in #1111
- Feat (llm): adding more quantizers by @i-colbert in #1113
- Feat (llm/learned_round): fast block update by @Giuseppe5 in #1110
- Fix SignSGD docstring by @pablomlago in #1115
- Feat (nn/sdpa): quantization of scaled dot-product attention by @nickfraser in #1090
- Fix (brevitas_examples/llm): scaling_min_val for fp32 by @Giuseppe5 in #1117
- Feat (scaling): no tracked_parameter_list with individual quantizer by @Giuseppe5 in #1112
- Feat (brevitas_examples/llm): select act_eq alpha by @Giuseppe5 in #1121
- Fix llm tests transformers by @pablomlago in #1118
- Fix (float/clamp): Bugfix when unsigned by @nickfraser in #1132
- Feat (brevitas_examples/llm): inference_mode support by @Giuseppe5 in #1129
- Feat (brevitas_examples/llm): correct scale init with CPU offloading by @Giuseppe5 in #1124
- Feat (brevitas_examples/sdxl): inference_mode + compile by @Giuseppe5 in #1133
- Feat (proxy): flag to enable/disable QT return by @Giuseppe5 in #1083
- Feat (examples/llm): Specify experiments via YAML files by @nickfraser in #1116
- test (core/float): Enhanced testing of minifloat formats by @nickfraser in #1136
- Eval harness by @Giuseppe5 in #1131
- Fix: pytree warning by @i-colbert in #1144
- Fix LLM entry point by @i-colbert in #1145
- Fix (scaling/standalone): better switch from runtime stats to param by @Giuseppe5 in #1099
- Fix (proxy): fix groupwise scale/zp caching by @Giuseppe5 in #1137
- Fix (export/inference_mode): correct rounding function by @Giuseppe5 in #1146
- Setup: pin transformers version by @Giuseppe5 in #1150
- Feat (mx): unpadding during dequantization by @Giuseppe5 in #1134
- Feat (brevitas_examples/llm): load from checkpoint by @Giuseppe5 in #1151
- Feat (rotation): equalize across SDPA by @Giuseppe5 in #1149
- Feat (quantization): torch_function based quantization by @Giuseppe5 in #1147
- Setup: bump torch version for LLM tests by @Giuseppe5 in #1154
- Feat (equalize): enable parametrized rotations by @pablomlago in #1148
- Feat (optim): add Cailey SGD optimizer by @pablomlago in #1153
- Setup: update pre-commit python version by @Giuseppe5 in #1158
- Fix (brevitas_examples/llm): remove unecessary checkpointing by @Giuseppe5 in #1161
- Feat (zero_point): dynamic groupwise zero point by @Giuseppe5 in #1160
- New rotation by @Giuseppe5 in #1159
- Fix (brevitas_examples/llm): equalized module + fx compatibility by @Giuseppe5 in #1164
- Fix (runtime_act): fix negative group_dim handling by @Giuseppe5 in #1157
- Fix (a2q): missing restrict_pre_scaling_impl definition by @Giuseppe5 in #1167
- Feat (equalize): enable rotation matrix optimization by @pablomlago in #1155
- Add FP16 support to ptq_evaluate.py and update README argument list by @hkayann in #1174
- Feat (brevitas_examples/llm): separate KV Cache quantization by @Giuseppe5 in #1165
- Feat (hadamard): support region expansion by @Giuseppe5 in #1178
- Feat (llm): benchmark for llm entrypoint by @pablomlago in #1166
- fix (docs/faq): remove reference to gitter, switch affine quantization to be an example by @nickfraser in #1183
- Fix (brevitas_examples/sdxl): correct import for inference_mode by @Giuseppe5 in #1185
- Feat (gpfq): optimizing with lower diagonal matrix formulation by @i-colbert in #1172
- Feat (brevitas_examples/llm): better dtype selection by @Giuseppe5 in #1186
- Fix (brevitas_examples/sdxl): faster sdxl inference by @Giuseppe5 in #1188
- fix (examples/benchmark): Fix when
run_results.yaml
does not exist. by @nickfraser in #1189 - Feat (example/common): Added groupwise, float scaled OCP option by @nickfraser in #1190
- Fix (examples/llm): default dtype from None to float16 by @pablomlago in #1191
- Fix (utils/torch_utils): ensure gradient propagation through pad_to_dim by @pablomlago in #1194
- Fix (examples/llm): prevent layernorm_to_rmsnorm option when fused_no_fx by @pablomlago in #1192
- Feat (brevitas_examples/sdxl): update mlperf by @Giuseppe5 in #1195
- Feat (brevitas_examples/llm): support for lighteval by @Giuseppe5 in #1162
- Fix (optim/cailey_sgd): fix cailey sgd in float16/bfloat16 by @pablomlago in #1193
- Feat (brevitas_examples/stable_diffusion): VAE quantization support by @Giuseppe5 in #1197
- Fix (quant_tensors): remove duplication by @Giuseppe5 in #1204
- Fix (brevitas_examples/llm): support MSE with offloaded models by @Giuseppe5 in #1196
- Fix (quant): improvements to quantization by @Giuseppe5 in #1207
- Fix (export/inference_mode): correct handler for dynamic float quant by @Giuseppe5 in #1208
- Feat: Initial SVDQuant support by @nickfraser in #1210
- Feat (equalize): enable parametrized scales by @pablomlago in #1175
- Fix (llm/equalize): remove call to _update_weights by @pablomlago in #1216
- Local compile support by @Giuseppe5 in #1206
- Setup: pin onnxruntime by @Giuseppe5 in #1218
- Fix (quant): clean-up to quantization code by @Giuseppe5 in #1219
- Fix (equalize): dtype fix in activation equalization by @pablomlago in #1217
- Feat (example/benchmark): Added script to convert YAML cfgs to "benchmark" configs by @nickfraser in #1184
- Support for transformer-based diffusion network by @Giuseppe5 in #1211
- Fix (brevitas_examples/llm): remove deprecated flag by @Giuseppe5 in #1225
- Fix (ex/llm): add missing copyright header by @nickfraser in #1227
- Feat (compile): limit activation recompiles by @Giuseppe5 in #1222
- Fix (ex/llm): Added defaults for several arguments. by @nickfraser in #1238
- Feat (compile): limit memory utilization with groupwise quantization by @Giuseppe5 in #1232
- Feat (brevitas_examples/diffusion): flux attention quantization by @Giuseppe5 in #1221
- Feat (brevitas_examples/llm): BOS preprocessing for calibration data by @Giuseppe5 in #1240
- Fix (test/ste_ops): fix mock tests by @nickfraser in #1242
- Fix (calibrate): correct zero_point init by @Giuseppe5 in #1243
- Feat (examples/generative): add fnuz quantizers by @Giuseppe5 in #1244
- Docs (readme): Update citation by @nickfraser in #1247
- Feat (ex/benchmark): Add optional start/end indices by @nickfraser in #1248
- Fix (ex/llm): Regenerate template configs by @nickfraser in #1249
- Fix (gptq): Fix several edge cases by @nickfraser in #1252
- Fix (brevitas_examples/diffusion): workaround for svdquant with SDXL by @Giuseppe5 in #1256
- Setup: fix pre_commit CI by @Giuseppe5 in #1264
- Feat (magr): initial implementation of MagR by @i-colbert in #1214
- Fix/Feat (trunc avg pool): Update truncation and average pool behaviour by @nickfraser in #1042
- Fix (ex/llm): Fix per-row quant_sdpa broadcastable shape by @nickfraser in #1254
- feat (ex/benchmark): Added option to shuffle order of benchmark processes by @nickfraser in #1268
- Fix (examples/llm): Fix PPLs by @pablomlago in #1271
- Fix (data): bos_processing in pile dataset by @i-colbert in #1259
- Feat (llm/eval): remove BOS token by @pablomlago in #1258
- Fix (graph/hadamard):
.view
can fail with functional QuantSDPA by @nickfraser in #1270 - Fix (scaling/float): correct dtype for threshold by @Giuseppe5 in #1265
- Fix (runtime_quant): correct priority for act quant by @Giuseppe5 in #1255
- Fix (quant_sdpa): remove print by @Giuseppe5 in #1273
- Feat (graph/calibrate): refactor DisableEnableQuantization by @pablomlago in #1257
- Fix (quant/float): input_view_impl for float_no_scale by @Giuseppe5 in #1260
- Fix (ci): Don't update PyTorch version by @nickfraser in #1275
- Feat (brevitas_examples/sdxl): better GPTQ by @Giuseppe5 in #1250
- Feat (ex/llm): bos preprocessing by @pablomlago in #1277
- test (ex/llm): Minor fixes to tests. Add rotation tests. by @nickfraser in #1253
- Fix (graph/equalize): fix value-output region in SDPA by @Giuseppe5 in #1278
- Feat (graph/calibrate): change quant_status_manager defaults to no-op by @pablomlago in #1274
- Fix (core/function): Fix learned round when padding is applied to weights by @nickfraser in #1235
- Fix (export/onnx): Improved ONNX export performance by @nickfraser in #1279
- Feat (llm/awq): activation-aware weight scaling by @pablomlago in #1213
- Docs: update / generate docs for 0.12.0 release by @nickfraser in #1284
- Docs: regen notebooks and docs by @nickfraser in #1285
New Contributors
- @dependabot made their first contribution in #1069
- @hkayann made their first contribution in #1174
Full Changelog: v0.11.0...v0.12.0