Skip to content

Official PyTorch Implementation of KL-Tracing: Taming generative video models for zero-shot optical flow extraction.

Notifications You must be signed in to change notification settings

neuroailab/kl_tracing

Repository files navigation

Taming generative video models for zero-shot optical flow extraction

Seungwoo Kim*1 · Khai Loong Aw*1 · Klemen Kotar*1

Cristobal Eyzaguirre1 · Wanhee Lee1 · Yunong Liu1 · Jared Watrous1

Stefan Stojanov1 · Juan Carlos Niebles1 · Jiajun Wu1 · Daniel L. K. Yamins1

1Stanford

(* equal contribution)

Paper PDF Project Page

We introduce KL-tracing, a novel test-time inference procedure that uses the Kullback-Leibler (KL) divergence of prediction logits for zero-shot extraction of optical flow from a generative video model without any additional task-specific fine-tuning. We obtain state-of-the-art point tracking / optical flow results when combined with the Local Random Access Sequence (LRAS) model.

🔨 Installation

conda create -n kl_tracing python=3.10
pip install uv 
uv pip install -e .[dev] 

# [Optional] for linting
pip install pre-commit 
pre-commit install 
pre-commit run --all-files

📀 Data

The evaluation script expects a JSON file that contains all evaluation points for TAP-Vid DAVIS.

a. Full TAP-Vid DAVIS (First)

  1. Download the TAP-Vid DAVIS pickle file here.
  2. Run the following
python preproc_tapvid.py \
 --pkl_path <pkl_path> \
 --img_root_dir data/davis_frames \
 --json_path data/davis_dataset.json

This will save all the frames to img_root_dir if they do not exist already (TAP-Vid DAVIS consists of 30 videos, each with varying number of frames), and create a JSON dataset at json_path with all evaluation points. This JSON dataset can be used to run eval.py.

b. Sampled Mini TAP-Vid DAVIS

  1. First, run the step above to get the full dataset.
  2. Run the following
python preproc_tapvid.py \
 --json_path data/davis_dataset.json \
 --sample

This will sample from the full dataset (as saved in json_path) and output a smaller eval JSON in data/mini_davis_dataset.json. The evaluation points will be chosen based on data/mini_davis_dataset_template.json for reproducibility.

TAP-Vid DAVIS Evaluation

Run the following:

DEVICE=0 ./run_tapvid_davis.sh <start_idx> <num_points>

This will generate results in eval_out/tapvid_davis_results for a subset of the dataset dataset[start_idx:start_idx+num_points. Launch the script in parallel on different subsets of the data. Once all results are generated, you can run

python offline_eval.py \ 
 --pkl_path <path_to_davis_dataset_pkl> \
 --json_path <path_to_davis_dataset_json> \
 --root_dirs <path_to_results_dir>

which will aggregate all the results and print out the final TAP-Vid metric.

Sampled TAP-Vid DAVIS Evaluation

Alternatively, when using the Mini TAP-Vid DAVIS evaluation, you can repeat the above with the script ./run_sampled_davis.sh, and then run the following

python offline_eval.py \
 --root_dirs <path_to_results_dir> \
 --sampled

Using this script, we are able to achieve an end-point error (EPE) of 1.23 on the mini TAP-Vid DAVIS dataset.

Citation

If you find this project useful, please consider citing:

@misc{kim2025taminggenerativevideomodels,
      title={Taming generative video models for zero-shot optical flow extraction}, 
      author={Seungwoo Kim and Khai Loong Aw and Klemen Kotar and Cristobal Eyzaguirre and Wanhee Lee and Yunong Liu and Jared Watrous and Stefan Stojanov and Juan Carlos Niebles and Jiajun Wu and Daniel L. K. Yamins},
      year={2025},
      eprint={2507.09082},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09082}, 
}

About

Official PyTorch Implementation of KL-Tracing: Taming generative video models for zero-shot optical flow extraction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published