Skip to content

Commit 9ccab83

Browse files
committed
Update scripts and ScienceQA instructions
1 parent 053c284 commit 9ccab83

File tree

9 files changed

+55
-97
lines changed

9 files changed

+55
-97
lines changed

docs/ScienceQA.md

Lines changed: 12 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -5,115 +5,40 @@
55
2. Generate ScienceQA dataset for LLaVA conversation-style format.
66

77
```Shell
8-
python scripts/convert_sqa_to_llava \
8+
python scripts/convert_sqa_to_llava.py \
99
convert_to_llava \
1010
--base-dir /path/to/ScienceQA/data/scienceqa \
11+
--prompt-format "QCM-LEA" \
1112
--split {train,val,minival,test,minitest}
1213
```
1314

1415
#### Training
15-
**NOTE**: Due to that ScienceQA experiments were done earlier, the current checkpoints are trained *without* `<im_start>` and `<im_end>` tokens. Here we provide our training scripts for the current checkpoints.
1616

17-
<details>
18-
<summary>1. Pretraining</summary>
17+
1. Pretraining
1918

20-
```Shell
21-
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
22-
llava/train/train_mem.py \
23-
--model_name_or_path ./checkpoints/llama-vicuna-13b \
24-
--data_path /path/to/cc3m_595k.json \
25-
--image_folder /path/to/cc3m_595k \
26-
--vision_tower openai/clip-vit-large-patch14 \
27-
--tune_mm_mlp_adapter True \
28-
--mm_vision_select_layer -2 \
29-
--bf16 True \
30-
--output_dir ./checkpoints/llava-13b-pretrain-no_im_start_end_token \
31-
--num_train_epochs 1 \
32-
--per_device_train_batch_size 16 \
33-
--per_device_eval_batch_size 4 \
34-
--gradient_accumulation_steps 1 \
35-
--evaluation_strategy "no" \
36-
--save_strategy "steps" \
37-
--save_steps 2400 \
38-
--save_total_limit 1 \
39-
--learning_rate 2e-3 \
40-
--weight_decay 0. \
41-
--warmup_ratio 0.03 \
42-
--lr_scheduler_type "cosine" \
43-
--logging_steps 1 \
44-
--tf32 True \
45-
--model_max_length 2048 \
46-
--gradient_checkpointing True \
47-
--lazy_preprocess True \
48-
--report_to wandb
49-
```
50-
</details>
19+
You can download our pretrained projector weights from our [Model Zoo](), or train your own projector weights using [`pretrain.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/pretrain.sh).
5120

52-
<details>
53-
<summary>2. Finetuning</summary>
21+
2. Finetuning
5422

55-
You may download our pretrained `llava-13b-v0-pretrain-no_im_start_end_token.bin` [here](https://huggingface.co/liuhaotian/LLaVA-13b-pretrain-projector-v0/blob/main/LLaVA-13b-pretrain-projector-v0-CC3M-595K-original_caption-no_im_token.bin).
56-
57-
```Shell
58-
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
59-
llava/train/train_mem.py \
60-
--model_name_or_path /path/to/llama-vicuna-13b \
61-
--data_path /path/to/scienceqa/llava_train_QCM-LEPA.json \
62-
--image_folder /path/to/scienceqa/images/train \
63-
--vision_tower openai/clip-vit-large-patch14 \
64-
--pretrain_mm_mlp_adapter ./checkpoints/llava-13b-pretrain-no_im_start_end_token/mm_projector.bin \
65-
--mm_vision_select_layer -2 \
66-
--bf16 True \
67-
--output_dir ./checkpoints/llava-13b-pretrain-no_im_start_end_token-finetune_scienceqa \
68-
--num_train_epochs 12 \
69-
--per_device_train_batch_size 4 \
70-
--per_device_eval_batch_size 4 \
71-
--gradient_accumulation_steps 1 \
72-
--evaluation_strategy "no" \
73-
--save_strategy "steps" \
74-
--save_steps 5000 \
75-
--save_total_limit 3 \
76-
--learning_rate 2e-5 \
77-
--weight_decay 0. \
78-
--warmup_ratio 0.03 \
79-
--lr_scheduler_type "cosine" \
80-
--logging_steps 1 \
81-
--tf32 True \
82-
--fsdp "full_shard auto_wrap" \
83-
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
84-
--model_max_length 2048 \
85-
--gradient_checkpointing True \
86-
--lazy_preprocess True \
87-
--report_to wandb
88-
```
89-
</details>
23+
See [`finetune_sqa.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/finetune_sqa.sh).
9024

9125
#### Evaluation
92-
93-
1. Download our pretrained LLaVA-13B (delta) weights for ScienceQA dataset [here](https://huggingface.co/liuhaotian/LLaVA-13b-delta-v0-science_qa). Convert the delta weights to actual weights.
94-
95-
```Shell
96-
python -m llava.model.apply_delta \
97-
--base /path/to/llama-13b \
98-
--target /path/to/LLaVA-13b-v0-science_qa \
99-
--delta liuhaotian/LLaVA-13b-delta-v0-science_qa
10026
```
10127
102-
2. [Option 1] Multiple-GPU inference
28+
1. Multiple-GPU inference
10329
You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for [batch evaluation](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_batch.sh) and [results gathering](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_gather.sh).
10430
105-
3. [Option 2] Single-GPU inference
31+
2. Single-GPU inference
10632
10733
(a) Generate LLaVA responses on ScienceQA dataset
10834
10935
```Shell
11036
python -m llava.eval.model_vqa_science \
111-
--model-path /path/to/LLaVA-13b-v0-science_qa \
112-
--question-file /path/to/ScienceQA/data/scienceqa/llava_test.json \
37+
--model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
38+
--question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
11339
--image-folder /path/to/ScienceQA/data/scienceqa/images/test \
11440
--answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \
115-
--answer-prompter \
116-
--conv-mode llava_v0
41+
--conv-mode llava_v1
11742
```
11843

11944
(b) Evaluate the generated responses
@@ -126,4 +51,4 @@ python eval_science_qa.py \
12651
--output-result vqa/results/ScienceQA/test_llava-13b_result.json \
12752
```
12853

129-
For reference, we attach our prediction file [`test_llava-13b_result.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail.
54+
For reference, we attach our prediction file [`test_sqa_llava_13b_v0.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail.

scripts/convert_sqa_to_llava.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from convert_sqa_to_llava_base_prompt import build_prompt_chatbot
66

77

8-
def convert_to_llava(base_dir, split, prompt_format="QCM-LEPA"):
8+
def convert_to_llava(base_dir, split, prompt_format="QCM-LEA"):
99
split_indices = json.load(open(os.path.join(base_dir, "pid_splits.json")))[split]
1010
problems = json.load(open(os.path.join(base_dir, "problems.json")))
1111

scripts/finetune.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
################## LLaMA-2 ##################
1414

1515
deepspeed llava/train/train_mem.py \
16-
--deepspeed /path/to/deepspeed.json \
16+
--deepspeed ./scripts/zero2.json \
1717
--model_name_or_path ./checkpoints/$MODEL_VERSION \
1818
--version $PROMPT_VERSION \
1919
--data_path ./playground/data/llava_instruct_80k.json \

scripts/finetune_full_schedule.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
################## LLaMA-2 ##################
1414

1515
deepspeed llava/train/train_mem.py \
16-
--deepspeed /path/to/deepspeed.json \
16+
--deepspeed ./scripts/zero2.json \
1717
--model_name_or_path ./checkpoints/$MODEL_VERSION \
1818
--version $PROMPT_VERSION \
1919
--data_path ./playground/data/llava_instruct_158k.json \

scripts/finetune_lora.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
################## LLaMA-2 ##################
1414

1515
deepspeed llava/train/train_mem.py \
16-
--deepspeed /path/to/deepspeed.json \
16+
--deepspeed ./scripts/zero2.json \
1717
--lora_enable True \
1818
--model_name_or_path ./checkpoints/$MODEL_VERSION \
1919
--version $PROMPT_VERSION \

scripts/finetune_qlora.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
################## LLaMA-2 ##################
1414

1515
deepspeed llava/train/train_mem.py \
16-
--deepspeed /path/to/deepspeed_zero2.json \
16+
--deepspeed ./scripts/zero2.json \
1717
--lora_enable True \
1818
--bits 4 \
1919
--model_name_or_path ./checkpoints/$MODEL_VERSION \

scripts/finetune_sqa.sh

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/bin/bash
2+
3+
deepspeed llava/train/train_mem.py \
4+
--deepspeed ./scripts/zero2.json \
5+
--model_name_or_path lmsys/vicuna-13b-v1.3 \
6+
--version $PROMPT_VERSION \
7+
--data_path /Data/ScienceQA/data/scienceqa/llava_train_QCM-LEA.json \
8+
--image_folder /Data/ScienceQA/data/scienceqa/images/train \
9+
--vision_tower openai/clip-vit-large-patch14 \
10+
--pretrain_mm_mlp_adapter ./checkpoints/huggingface/liuhaotian/llava-pretrain-vicuna-13b-v1.3/mm_projector.bin \
11+
--mm_vision_select_layer -2 \
12+
--mm_use_im_start_end False \
13+
--mm_use_im_patch_token False \
14+
--bf16 True \
15+
--output_dir ./checkpoints/llava-vicuna-13b-v1.3-pretrain_lcs558k_plain-ScienceQA_QCM_LEA-12e \
16+
--num_train_epochs 12 \
17+
--per_device_train_batch_size 16 \
18+
--per_device_eval_batch_size 4 \
19+
--gradient_accumulation_steps 1 \
20+
--evaluation_strategy "no" \
21+
--save_strategy "steps" \
22+
--save_steps 50000 \
23+
--save_total_limit 1 \
24+
--learning_rate 2e-5 \
25+
--weight_decay 0. \
26+
--warmup_ratio 0.03 \
27+
--lr_scheduler_type "cosine" \
28+
--logging_steps 1 \
29+
--tf32 True \
30+
--model_max_length 2048 \
31+
--gradient_checkpointing True \
32+
--dataloader_num_workers 4 \
33+
--lazy_preprocess True \
34+
--report_to wandb

scripts/pretrain.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ PROMPT_VERSION=plain
1111
########### DO NOT CHANGE ###########
1212

1313
deepspeed llava/train/train_mem.py \
14-
--deepspeed /path/to/deepspeed.json \
14+
--deepspeed ./scripts/zero2.json \
1515
--model_name_or_path ./checkpoints/$MODEL_VERSION \
1616
--version $PROMPT_VERSION \
1717
--data_path /path/to/pretrain_data.json \

scripts/sqa_eval_batch.sh

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,11 @@
33
CHUNKS=8
44
for IDX in {0..7}; do
55
CUDA_VISIBLE_DEVICES=$IDX python -m llava.eval.model_vqa_science \
6-
--model-path ./checkpoints/LLaVA-13b-v0-science_qa \
7-
--question-file ~/haotian/datasets/ScienceQA/data/scienceqa/llava_test_QCM-LEPA.json \
6+
--model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
7+
--question-file ~/haotian/datasets/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
88
--image-folder ~/haotian/datasets/ScienceQA/data/scienceqa/images/test \
99
--answers-file ./test_llava-13b-chunk$CHUNKS_$IDX.jsonl \
1010
--num-chunks $CHUNKS \
1111
--chunk-idx $IDX \
12-
--answer-prompter \
13-
--conv-mode llava_v0 &
12+
--conv-mode llava_v1 &
1413
done

0 commit comments

Comments
 (0)