-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
Description
Your question
Ask a clear and concise question about Megatron-LM.
Hi, I am doing a toy experiment in training the model.
I specified the TRAIN_SAMPLES=100
in my train.sh
. And there's only 100 data points in my training dataset.
TRAIN_SAMPLES=100 # 300B tokens / 4096
LR_WARMUP_SAMPLES=0
LR_DECAY_SAMPLES=100 # TRAIN_SAMPLES - LR_WARMUP_SAMPLES
options=" \
...
--train-samples ${TRAIN_SAMPLES} \
--lr-warmup-samples ${LR_WARMUP_SAMPLES} \
--lr-decay-samples ${LR_DECAY_SAMPLES} \
...
--split 99,1,0 \
torchrun --nproc_per_node 1 pretrain_model.py ${options}
But the log appears that it shows
total number of epochs: 165
despite I set TRAIN_SAMPLES=100
Why will this happen when I am using --train-samples
flag instead of --train-itr
?