-
Notifications
You must be signed in to change notification settings - Fork 281
Open
Description
Hi @willccbb,
I have a question regarding how effective batch size and the number of global training steps per epoch are computed in the GPPOTrainer
.
My setup:
6000
prompts for trainingnum_generations=8
gradient_accumulation_steps=8
per_device_train_batch_size=8
2
GPUs for training
My current understanding is that the total number of global steps in one epoch can be calculated as:
total_global_steps = (#prompts) * num_generations / effective_batch_size
where:
effective_batch_size = per_device_train_batch_size * num_processes * gradient_accumulation_steps
Plugging in my settings:
total_global_steps = 6000 * 8 / (8 * 2 * 8) = 375
However, I noticed that the logs (e.g., on wandb) show that one epoch actually corresponds to 750 global steps
in my case, which is double what I expected.
Could you clarify how the effective batch size and total steps per epoch are computed? Am I misunderstanding how batches are constructed in GPPOTrainer
?
Thanks for your help!
Metadata
Metadata
Assignees
Labels
No labels