smartclip

Adaptive gradient clipping for PyTorch, TensorFlow, and JAX.

SmartClip keeps training stable with adaptive, per-step clipping you can enable in one line of code.

See the full documentation for details of the algorithms, framework usage examples, and logging metrics.

Supported Algorithms

AutoClip — Seetharaman et al., 2020 (MLSP). Adaptive percentile-based clipping of gradient norms.
- AutoClip: Adaptive Gradient Clipping for Source Separation Networks (arXiv:2007.14469)
Adaptive Gradient Clipping (AGC, NFNets-style) — Brock, De, Smith, 2021. Threshold scales with parameter norm.
- High-Performance Large-Scale Image Recognition Without Normalization (arXiv:2102.06171)
Z-Score clipping (EMA mean/std) — standard z-score thresholding using streaming mean/variance
- zmax controls how aggressive clipping is: threshold is mean + zmax * std over recent norms. Higher zmax clips less (more tolerant), lower clips more (more aggressive). Start at zmax=3.0; try 2.0–2.5 if you see instability from spikes, or 3.5–4.0 if training seems over‑clipped.

Install

pip install smartclip

Optional extras provide helpers for specific frameworks (install framework wheels first per vendor docs):

pip install "smartclip[torch]"    # PyTorch + Lightning/Transformers helpers
pip install "smartclip[tf]"       # TensorFlow/Keras helpers
pip install "smartclip[jax]"      # JAX/Flax/Optax helpers

Quickstart

PyTorch

import torch
import smartclip as sc

model = MyModel().to("cpu")
opt = torch.optim.AdamW(model.parameters(), lr=3e-4)

with sc.clip_context(model, opt):  # Defaults to AutoClip
    for x, y in loader:
        opt.zero_grad(set_to_none=True)
        loss = model(x).loss_fn(y)
        loss.backward()
        opt.step()  # clipped automatically

TensorFlow/Keras

import tensorflow as tf
import smartclip as sc

model = MyModel()
opt = tf.keras.optimizers.Adam(3e-4)

with sc.clip_context(model, opt, clipper=sc.ZScoreClip(zmax=3.0)):  # Use the zscore algorithm
    model.fit(ds, epochs=5)

JAX/Optax

import jax
import optax
from flax import linen as nn
import smartclip as sc

model = MyModel()  # Flax Module
tx = optax.adam(3e-4)

with sc.clip_context(model, tx):  # wraps tx.update
    grads = jax.grad(loss_fn)(params, batch)
    updates, opt_state = tx.update(grads, opt_state, params)  # clipped automatically
    params = optax.apply_updates(params, updates)

See documentation for full guides for TensorFlow, JAX, Lightning, Keras, and HF Trainer.

Contributing

We welcome issues and pull requests. See contribute.md for developer setup, testing, docs, and release workflows.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
smartclip		smartclip
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CI.md		CI.md
LICENSE		LICENSE
README.md		README.md
contribute.md		contribute.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

smartclip

Supported Algorithms

Install

Quickstart

PyTorch

TensorFlow/Keras

JAX/Optax

Contributing

License

About

Uh oh!

Releases 2

Packages

Languages

License

stefangordon/smartclip

Folders and files

Latest commit

History

Repository files navigation

smartclip

Supported Algorithms

Install

Quickstart

PyTorch

TensorFlow/Keras

JAX/Optax

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages