Ziwei Shan1,*,
Yaoyu He1,*,
Chengfeng Zhao1,*,†,
Jiashen Du1,
Jingyan Zhang1,
Qixuan Zhang1,2,
Jingyi Yu1,‡,
Lan Xu1,‡
1ShanghaiTech University
2Deemos Technology
*Equal contribution
†Project lead ‡Corresponding author
We tested our environment on Ubuntu 20.04 LTS
and Windows 11
with CUDA 12.1
.
conda create python=3.10 --name mojito
conda activate mojito
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
# ignore deepspeed installation if using Win 11
DS_BUILD_OPS=1 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_RAGGED_DEVICE_OPS=0 DS_BUILD_EVOFORMER_ATTN=0 pip install deepspeed
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install "fastapi[standard]"
Download SMPL-H (the extended SMPL+H model) and put the models under body_model/
folder. The structure of body_model/
folder should be:
body_model/
|--body_model.py
|--utils.py
|--smplh/
|----info.txt
|----LICENSE.txt
|----female/
|------model.npz
|----male/
|------model.npz
|----neutral/
|------model.npz
We are releasing the IMU tokenizer model mojito_imu_tokenizer.pth. To set up:
- Download the model checkpoint.
- Create a
checkpoints/
directory in your project if it doesn't exist. - Place the downloaded file in
checkpoints/mojito_imu_tokenizer.pth
.
Run the processing script
python -m example --cfg configs/config_imu_tokenizer.yaml --nodebug
- Ziwei Shan - koyui
- Yaoyu He - TropinoneH
- Chengfeng Zhao - AfterJourney00
- Jiashen Du - ALT-JS
If you find our code or paper helps, please consider citing:
@article{shan2025mojito,
title = {Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens},
author = {Shan, Ziwei and He, Yaoyu and Du, Jiashen and Zhao, Chengfeng and Zhang, Jingyan and
Zhang, Qixuan and Yu, Jingyi and Xu, Lan},
journal = {arXiv preprint arXiv:},
year = {2025}
}
Thanks to the following work that we refer to and benefit from:
- MotionGPT: the overall framework;
- Qwen2: the causal language model;
- EgoEgo: the SMPL-H body model script;
- TransPose: the data pre-processing of TotalCapture dataset;
- SmoothNet: SMPL pose smoother
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.