Question regarding the effective batch size

Hi @willccbb,

I have a question regarding how effective batch size and the number of global training steps per epoch are computed in the `GPPOTrainer`.

My setup:
- `6000` prompts for training
- `num_generations=8`
- `gradient_accumulation_steps=8`
- `per_device_train_batch_size=8`
- `2` GPUs for training

My current understanding is that the total number of global steps in one epoch can be calculated as:
```
total_global_steps = (#prompts) * num_generations / effective_batch_size
```
where:
```
effective_batch_size = per_device_train_batch_size * num_processes * gradient_accumulation_steps
```
Plugging in my settings:
```
total_global_steps = 6000 * 8 / (8 * 2 * 8) = 375
```
However, I noticed that the logs (e.g., on wandb) show that one epoch actually corresponds to `750 global steps` in my case, which is double what I expected.

Could you clarify how the **effective batch size** and **total steps per epoch** are computed? Am I misunderstanding how batches are constructed in `GPPOTrainer`?

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question regarding the effective batch size #112

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question regarding the effective batch size #112

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions