GitHub - neuroailab/kl_tracing: Official PyTorch Implementation of KL-Tracing: Taming generative video models for zero-shot optical flow extraction.

Taming generative video models for zero-shot optical flow extraction

Seungwoo Kim*¹ · Khai Loong Aw*¹ · Klemen Kotar*¹

Cristobal Eyzaguirre¹ · Wanhee Lee¹ · Yunong Liu¹ · Jared Watrous¹

Stefan Stojanov¹ · Juan Carlos Niebles¹ · Jiajun Wu¹ · Daniel L. K. Yamins¹

¹Stanford

(* equal contribution)

We introduce KL-tracing, a novel test-time inference procedure that uses the Kullback-Leibler (KL) divergence of prediction logits for zero-shot extraction of optical flow from a generative video model without any additional task-specific fine-tuning. We obtain state-of-the-art point tracking / optical flow results when combined with the Local Random Access Sequence (LRAS) model.

🔨 Installation

conda create -n kl_tracing python=3.10
pip install uv 
uv pip install -e .[dev] 

# [Optional] for linting
pip install pre-commit 
pre-commit install 
pre-commit run --all-files

📀 Data

The evaluation script expects a JSON file that contains all evaluation points for TAP-Vid DAVIS.

a. Full TAP-Vid DAVIS (First)

Download the TAP-Vid DAVIS pickle file here.
Run the following

python preproc_tapvid.py \
 --pkl_path <pkl_path> \
 --img_root_dir data/davis_frames \
 --json_path data/davis_dataset.json

This will save all the frames to img_root_dir if they do not exist already (TAP-Vid DAVIS consists of 30 videos, each with varying number of frames), and create a JSON dataset at json_path with all evaluation points. This JSON dataset can be used to run eval.py.

b. Sampled Mini TAP-Vid DAVIS

First, run the step above to get the full dataset.
Run the following

python preproc_tapvid.py \
 --json_path data/davis_dataset.json \
 --sample

This will sample from the full dataset (as saved in json_path) and output a smaller eval JSON in data/mini_davis_dataset.json. The evaluation points will be chosen based on data/mini_davis_dataset_template.json for reproducibility.

TAP-Vid DAVIS Evaluation

Run the following:

DEVICE=0 ./run_tapvid_davis.sh <start_idx> <num_points>

This will generate results in eval_out/tapvid_davis_results for a subset of the dataset dataset[start_idx:start_idx+num_points. Launch the script in parallel on different subsets of the data. Once all results are generated, you can run

python offline_eval.py \ 
 --pkl_path <path_to_davis_dataset_pkl> \
 --json_path <path_to_davis_dataset_json> \
 --root_dirs <path_to_results_dir>

which will aggregate all the results and print out the final TAP-Vid metric.

Sampled TAP-Vid DAVIS Evaluation

Alternatively, when using the Mini TAP-Vid DAVIS evaluation, you can repeat the above with the script ./run_sampled_davis.sh, and then run the following

python offline_eval.py \
 --root_dirs <path_to_results_dir> \
 --sampled

Using this script, we are able to achieve an end-point error (EPE) of 1.23 on the mini TAP-Vid DAVIS dataset.

Citation

If you find this project useful, please consider citing:

@misc{kim2025taminggenerativevideomodels,
      title={Taming generative video models for zero-shot optical flow extraction}, 
      author={Seungwoo Kim and Khai Loong Aw and Klemen Kotar and Cristobal Eyzaguirre and Wanhee Lee and Yunong Liu and Jared Watrous and Stefan Stojanov and Juan Carlos Niebles and Jiajun Wu and Daniel L. K. Yamins},
      year={2025},
      eprint={2507.09082},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09082}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
data		data
lras		lras
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
eval.py		eval.py
offline_eval.py		offline_eval.py
preproc_tapvid.py		preproc_tapvid.py
run_sampled_davis.sh		run_sampled_davis.sh
run_tapvid_davis.sh		run_tapvid_davis.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Taming generative video models for zero-shot optical flow extraction

🔨 Installation

📀 Data

a. Full TAP-Vid DAVIS (First)

b. Sampled Mini TAP-Vid DAVIS

TAP-Vid DAVIS Evaluation

Sampled TAP-Vid DAVIS Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

neuroailab/kl_tracing

Folders and files

Latest commit

History

Repository files navigation

Taming generative video models for zero-shot optical flow extraction

🔨 Installation

📀 Data

a. Full TAP-Vid DAVIS (First)

b. Sampled Mini TAP-Vid DAVIS

TAP-Vid DAVIS Evaluation

Sampled TAP-Vid DAVIS Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages