Panda: Patched Attention for Nonlinear Dynamics
This repository contains the code to reproduce the experiments presented in our arXiv preprint arXiv:2505.13755
We have released model weights on HF at https://huggingface.co/GilpinLab/panda
We also have MLM weights up on HF at https://huggingface.co/GilpinLab/panda_mlm
We are also in the process of scaling up our training and model size, so stay tuned!
Paper abstract:
"Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained separately on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear DynAmics. We train Panda on a novel synthetic, extensible dataset of
$2 \times 10^4$ chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen real world chaotic systems, and nonlinear resonance patterns in cross-channel attention heads. Despite having been trained only on low-dimensional ordinary differential equations, Panda spontaneously develops the ability to predict partial differential equations without retraining. We demonstrate a neural scaling law for differential equations, underscoring the potential of pretrained models for probing abstract mathematical domains like nonlinear dynamics."
Install the most up-to-date version of dysts for dynamical systems with pip install --no-deps git+https://github.com/williamgilpin/dysts
. Consider installing numba
for faster numerical integration.
NOTE: When cloning this repo, to avoid downloading the large commit history (~ 60 MB) we recommend a shallow clone:
git clone --depth=1 [email protected]:abao1999/panda.git
To setup, run:
$ pip install -e .
If training on AMD GPUs, install with the ROCm extras:
$ pip install -e .[rocm] --extra-index-url https://download.pytorch.org/whl/rocm5.7
Our dataset consists of parameter perturbations of base and skew systems. Each trajectory is a numerically integrated system of coupled ODEs that we filter according to the methodology outlined in our preprint. To run the data generation, see our scripts for making trajectories from saved params, parameter perturbations of skew systems, and parameter perturbations of base systems. For ease of use we have also provided an example data generation bash script that calls these scripts.
We provide example bash scripts to train our model, both for forecasting and for MLM (completions). Recall that it is possible to train an MLM checkpoint and use the encoder for prediction finetuning (SFT) for forecasting. See our training script for more details.
In notebooks/load_model_from_hf.ipynb we provide a minimal working example of loading our trained checkpoint from HuggingFace and running inference (generating forecasts). For reproducibility, we also provide a serialized json file (~ 10 MB) containing the parameters for some of our held-out skew systems. These parameters can then be loaded and used to generate trajectories from the corresponding systems.
For a more thorough evaluation, see our evaluation script, which we used to present the results in our preprint. A corresponding script exists for each of the baselines we evaluate on, within scripts
.
In notebooks/load_mlm_from_hf.ipynb we provide a minimal working example of loading our trained MLM checkpoint from HuggingFace and generating completions.
If you use this codebase or otherwise find our work valuable, please cite us:
@misc{lai2025panda,
title={Panda: A pretrained forecast model for universal representation of chaotic dynamics},
author={Jeffrey Lai and Anthony Bao and William Gilpin},
year={2025},
eprint={2505.13755},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.13755},
}