Skip to content

Commit 5e7dc21

Browse files
committed
Release Training Code
1 parent 6d38d12 commit 5e7dc21

File tree

61 files changed

+4117
-68
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+4117
-68
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,5 @@ model_checkpoints
66
**/*.bin
77
**/*.log
88
**/output*
9-
**/eval_results*
9+
**/eval_results*
10+
**/berkeley_fanuc_manipulation

README.md

Lines changed: 137 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ To this end, we introduce <b>Moto</b>, which converts video content into latent
2121
We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood.
2222
To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulations.
2323

24-
## ️Quick Start
24+
## 🛠️Quick Start
2525

2626
### Installation
2727
Clone this repo:
@@ -97,7 +97,7 @@ cd ..
9797
### Model Weights
9898
We release the Latent Motion Tokenizer, the pre-traiend Moto-GPT and the fine-tuned Moto-GPT in [Moto Hugging Face](https://huggingface.co/TencentARC/Moto). You can download them separately and save them in corresponding directories ([`latent_motion_tokenizer/checkpoints/`](latent_motion_tokenizer/checkpoints) and [`moto_gpt/checkpoints/`](moto_gpt/checkpoints)).
9999

100-
## 💻Inference
100+
## 🤖Inference
101101

102102
### Latent trajectory inference with the pre-trained Moto-GPT and the Latent Motion Tokenizer
103103
```bash
@@ -129,11 +129,145 @@ nohup bash evaluate_moto_gpt_in_simpler.sh > evaluate_moto_gpt_in_simpler.log 2>
129129
tail -f evaluate_moto_gpt_in_simpler.log
130130
```
131131

132+
## 🔥Training
133+
### Prepare Datasets
134+
#### 1. CALVIN dataset
135+
- Download and preprocess Split ABC->D dataset from [CALVIN](https://github.com/mees/calvin/tree/main/dataset):
136+
```bash
137+
conda activate moto
138+
export PROJECT_ROOT=[your path to Moto project]
139+
export OUTPUT_ROOT=[your path to save datasets]
140+
cd ${PROJECT_ROOT}/scripts/
141+
nohup bash download_and_preprocess_calvin_data.sh > download_and_preprocess_calvin_data.log 2>&1 &
142+
tail -f download_and_preprocess_calvin_data.log
143+
```
144+
145+
#### 2. Open X-Embodiment datasets
146+
- Install [gsutil](https://cloud.google.com/storage/docs/gsutil_install)
147+
148+
- Download and preprocess datasets from [Open X-Embodiment](https://github.com/google-deepmind/open_x_embodiment):
149+
```bash
150+
conda activate moto
151+
pip install tensorflow-datasets
152+
export PROJECT_ROOT=[your path to Moto project]
153+
export OUTPUT_ROOT=[your path to save datasets]
154+
cd ${PROJECT_ROOT}/scripts/
155+
nohup bash download_and_preprocess_oxe_data.sh > download_and_preprocess_oxe_data.log 2>&1 &
156+
tail -f download_and_preprocess_oxe_data.log
157+
```
158+
159+
<!-- - Modify the `video_dir` and `lmdb_dir` fields in data configs from [latent_motion_tokenizer/configs/data/](latent_motion_tokenizer/configs/data/) and [moto_gpt/configs/data/](moto_gpt/configs/data/) -->
160+
161+
### Training Latent Motion Tokenizer
162+
#### 1. Training on CALVIN dataset
163+
- Modify the `npz_dir` field in [latent_motion_tokenizer/configs/data/calvin.yaml](latent_motion_tokenizer/configs/data/calvin.yaml)
164+
165+
- Config the paths in [latent_motion_tokenizer/configs/train/data_calvin-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.0001_bs256-aug_shiftTrue_resizedCropFalse.yaml](latent_motion_tokenizer/configs/train/data_calvin-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.0001_bs256-aug_shiftTrue_resizedCropFalse.yaml)
166+
167+
- Run the following commands:
168+
169+
```bash
170+
conda activate moto
171+
export PROJECT_ROOT=[your path to Moto project]
172+
export CONFIG_NAME="data_calvin-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.0001_bs256-aug_shiftTrue_resizedCropFalse"
173+
cd ${PROJECT_ROOT}/scripts/
174+
nohup bash train_latent_motion_tokenizer_on_calvin.sh > train_latent_motion_tokenizer_on_calvin.log 2>&1 &
175+
tail -f train_latent_motion_tokenizer_on_calvin.log
176+
```
177+
178+
#### 2. Training on Open X-Embodiment datasets
179+
- Modify the `video_dir` field in [latent_motion_tokenizer/configs/data/rtx.yaml](latent_motion_tokenizer/configs/data/rtx.yaml)
180+
181+
- Config the paths in [latent_motion_tokenizer/configs/train/data_rtx-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.001_bs256-aug_shiftTrue_resizedCropFalse.yaml](latent_motion_tokenizer/configs/train/data_rtx-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.001_bs256-aug_shiftTrue_resizedCropFalse.yaml)
182+
183+
- Run the following commands:
184+
185+
```bash
186+
conda activate moto
187+
export PROJECT_ROOT=[your path to Moto project]
188+
export CONFIG_NAME="data_rtx-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.001_bs256-aug_shiftTrue_resizedCropFalse"
189+
cd ${PROJECT_ROOT}/scripts/
190+
nohup bash train_latent_motion_tokenizer_on_oxe.sh > train_latent_motion_tokenizer_on_oxe.log 2>&1 &
191+
tail -f train_latent_motion_tokenizer_on_oxe.log
192+
```
193+
194+
195+
196+
### Pre-training Moto-GPT
197+
#### 1. Pre-training on CALVIN dataset
198+
- Modify the `lmdb_dir` field in [moto_gpt/configs/data/calvin.yaml](moto_gpt/configs/data/calvin.yaml)
199+
200+
- Config the paths in [moto_gpt/configs/train/data_calvin-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse.yaml](moto_gpt/configs/train/data_calvin-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse.yaml)
201+
202+
- Run the following commands:
203+
204+
```bash
205+
conda activate moto
206+
export PROJECT_ROOT=[your path to Moto project]
207+
export CONFIG_NAME="data_calvin-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse"
208+
cd ${PROJECT_ROOT}/scripts/
209+
nohup bash pretrain_moto_gpt_on_calvin.sh > pretrain_moto_gpt_on_calvin.log 2>&1 &
210+
tail -f pretrain_moto_gpt_on_calvin.log
211+
```
212+
213+
214+
215+
#### 2. Pre-training on Open X-Embodiment datasets
216+
- Modify the `video_dir` and `lmdb_dir` fields in [moto_gpt/configs/data/rtx.yaml](moto_gpt/configs/data/rtx.yaml)
217+
218+
- Config the paths in [moto_gpt/configs/train/data_rtx-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse.yaml](moto_gpt/configs/train/data_rtx-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse.yaml)
219+
220+
- Run the following commands:
221+
222+
```bash
223+
conda activate moto
224+
export PROJECT_ROOT=[your path to Moto project]
225+
export CONFIG_NAME="data_rtx-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse"
226+
ps aux | grep ${CONFIG_NAME} | awk '{print $2}' | xargs kill -9
227+
cd ${PROJECT_ROOT}/scripts/
228+
nohup bash pretrain_moto_gpt_on_oxe.sh > pretrain_moto_gpt_on_oxe.log 2>&1 &
229+
tail -f pretrain_moto_gpt_on_oxe.log
230+
```
231+
232+
233+
### Fine-tuning Moto-GPT
234+
#### 1. Fine-tuning on CALVIN dataset
235+
- Modify the `lmdb_dir` fields in [moto_gpt/configs/data/calvin.yaml](moto_gpt/configs/data/calvin.yaml)
236+
237+
- Config the paths in [moto_gpt/configs/train/data_calvin-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0002_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_calvin_Epoch10.yaml](moto_gpt/configs/train/data_calvin-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0002_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_calvin_Epoch10.yaml)
238+
239+
- Run the following commands:
240+
241+
```bash
242+
conda activate moto
243+
export PROJECT_ROOT=[your path to Moto project]
244+
export CONFIG_NAME="data_calvin-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0002_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_calvin_Epoch10"
245+
cd ${PROJECT_ROOT}/scripts/
246+
nohup bash finetune_moto_gpt_on_calvin.sh > finetune_moto_gpt_on_calvin.log 2>&1 &
247+
tail -f finetune_moto_gpt_on_calvin.log
248+
```
249+
250+
#### 2. Fine-tuning on RT-1 dataset
251+
- Modify the `video_dir` and `lmdb_dir` fields in [moto_gpt/configs/data/rt1.yaml](moto_gpt/configs/data/rt1.yaml)
252+
253+
- Config the paths in [moto_gpt/configs/train/data_rt1-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_oxe_Epoch10.yaml](moto_gpt/configs/train/data_rt1-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_oxe_Epoch10.yaml)
254+
255+
- Run the following commands:
256+
257+
```bash
258+
conda activate moto
259+
export PROJECT_ROOT=[your path to Moto project]
260+
export CONFIG_NAME="data_rt1-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_oxe_Epoch10"
261+
cd ${PROJECT_ROOT}/scripts/
262+
nohup bash finetune_moto_gpt_on_rt1.sh > finetune_moto_gpt_on_rt1.log 2>&1 &
263+
tail -f finetune_moto_gpt_on_rt1.log
264+
```
265+
132266
## 📝To Do
133267
- [x] Release the Latent Motion Tokenizer
134268
- [x] Release the pre-trained and fine-tuned Moto-GPT
135269
- [x] Release the inference code
136-
- [ ] Release the trainig code
270+
- [x] Release the training code
137271

138272

139273
## 📚Citation

common/data/data_utils.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
import omegaconf
2+
import hydra
3+
import pyrootutils
4+
import os
5+
import sys
6+
import torch
7+
pyrootutils.setup_root(__file__, indicator='.project-root', pythonpath=True, dotenv=True)
8+
from transformers import AutoTokenizer
9+
from transformers.utils import FEATURE_EXTRACTOR_NAME, get_file_from_repo
10+
import json
11+
from common.data.datasets import LMDBDataset_for_MotoGPT_RT1, LMDBDataset_for_MotoGPT_OXE, LMDBDataset_for_MotoGPT_Video, LMDBDataset_Mix, JsonDataset_for_MotoGPT_Video, NpzDataset_for_MotoGPT_Video, LMDBDataset_for_MotoGPT_CALVIN
12+
from common.data.mix_utils import BASE_STEPSIZE, DISPLAY_KEY
13+
from torchvision.transforms.v2 import Resize, InterpolationMode
14+
from torch.utils.data import ConcatDataset, WeightedRandomSampler
15+
16+
data_type2dataset_cls = {
17+
'rt1': LMDBDataset_for_MotoGPT_RT1,
18+
'video': LMDBDataset_for_MotoGPT_Video,
19+
'oxe': LMDBDataset_for_MotoGPT_OXE,
20+
'video_json': JsonDataset_for_MotoGPT_Video,
21+
'video_npz': NpzDataset_for_MotoGPT_Video,
22+
'calvin': LMDBDataset_for_MotoGPT_CALVIN,
23+
}
24+
25+
def load_dataset(data_config, extra_data_config):
26+
if type(data_config) is str:
27+
data_config = omegaconf.OmegaConf.load(data_config)
28+
data_config = dict(data_config)
29+
30+
data_type = data_config.pop('data_type')
31+
32+
key_map = {
33+
'latent_motion_pred': 'do_extract_future_frames',
34+
'act_pred': 'do_extract_action'
35+
}
36+
for k, v in extra_data_config.items():
37+
mapped_k = key_map.get(k, k)
38+
data_config[mapped_k] = v
39+
40+
if data_type == 'mix':
41+
sub_data_configs = data_config.pop('sub_data_configs')
42+
rgb_preprocessor = Resize(data_config['rgb_shape'], interpolation=InterpolationMode.BICUBIC, antialias=True)
43+
train_datasets = []
44+
eval_datasets = []
45+
train_sample_weights = []
46+
eval_sample_weights = []
47+
48+
for sub_data_config in sub_data_configs:
49+
sub_data_config = dict(sub_data_config)
50+
data_name = sub_data_config.pop('data_name')
51+
weight = sub_data_config.pop('weight')
52+
if ('lmdb_dir' not in sub_data_config) and ('lmdb_dir' in data_config):
53+
sub_data_config['lmdb_dir'] = os.path.join(data_config['lmdb_dir'], data_name)
54+
if ('video_dir' not in sub_data_config) and ('video_dir' in data_config):
55+
sub_data_config['video_dir'] = os.path.join(data_config['video_dir'], data_name, DISPLAY_KEY.get(data_name, 'image'))
56+
step_size = max(round(BASE_STEPSIZE.get(data_name, 1) / BASE_STEPSIZE['fractal20220817_data']), 1)
57+
sub_data_config['skip_frame'] = data_config['skip_frame'] * step_size
58+
59+
if 'max_skip_frame' in data_config:
60+
sub_data_config['max_skip_frame'] = data_config['max_skip_frame'] * step_size
61+
62+
sub_data_config['rgb_shape'] = data_config['rgb_shape']
63+
sub_data_config['rgb_preprocessor'] = rgb_preprocessor
64+
65+
train_dataset, eval_dataset = load_dataset(sub_data_config, extra_data_config)
66+
train_datasets.append(train_dataset)
67+
eval_datasets.append(eval_dataset)
68+
train_sample_weights.append(weight)
69+
eval_sample_weights.append(weight)
70+
71+
72+
if data_config['weighted']:
73+
train_dataset = LMDBDataset_Mix(datasets=train_datasets, sample_weights=train_sample_weights)
74+
eval_dataset = LMDBDataset_Mix(datasets=eval_datasets, sample_weights=eval_sample_weights)
75+
else:
76+
train_dataset = ConcatDataset(train_datasets)
77+
eval_dataset = ConcatDataset(eval_datasets)
78+
79+
else:
80+
dataset_cls = data_type2dataset_cls[data_type]
81+
train_dataset = dataset_cls(split='train', **data_config)
82+
eval_dataset = dataset_cls(split='val', **data_config)
83+
84+
return train_dataset, eval_dataset

0 commit comments

Comments
 (0)