[GRPO] Adds an option to sleep vllm when running in colocated mode #3968

edbeeching · 2025-08-28T12:53:30Z

What does this PR do?

This PR adds an option which using vllm colocate model to offload vllm's weights and cache to CPU memory during the optimization step, freeing up more ram for model activations, etc.

The current implementation is as follows:

With vllm_sleep_enabled=True:

This option was mentioned in https://huggingface.co/blog/vllm-colocate#sleep-mode-in-vllm but has not been upstreamed to TRL yet.

Comparison of training curves with / without this option:

Comparable loss / reward curves, but slower training due to sleep / wake.

In a future PR, we can potentially add CPU offloading to the model / optimizer. So vllm has access to more memory for the generation step.

Memory benchmarks for max seq length in the two configurations:

(TODO)

HuggingFaceDocBuilderDev · 2025-08-28T12:58:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

toslali-ibm · 2025-08-28T13:14:56Z

trl/trainer/grpo_trainer.py

                )
+                if self.args.vllm_sleep_enabled:
+                    self.llm.sleep(level=1)


Every time vllm wakes up, it wakes up to an updated model, right? So why not use level = 2 to further improve efficiency?

Ah good point, let me rerun the benchmark with level=2.

For level = 2, you will also need to wake-up then sleep in _move_model_to_vllm, _sync_fsdp2_params_to_vllm and _sync_fsdp1_params_to_vllm.

BAsically anytime you touch vllm (in generation or loading/syncying the model), you wake up, do the work, and then go back to sleep.

Additional note: the reason we did not move sleep to upstream was because of a vllm bug, which is just recently fixed by this PR. So that means, you need to use vllm version that incorporates the fix to be able to use sleep level 2 without segmentation fault.

Yes, I have already included a wake up call before the weight sync. I don't re-sleep as we have the gen step straight after.

It looks like the level 2 fix has not made it into their most recent release. I will leave level=1 for now for better backward compatability. Unless there are other reasons to go with level 2?

lewtun

LGTM once the comment from @toslali-ibm is tested

docs/source/grpo_trainer.md

trl/trainer/grpo_config.py

trl/trainer/grpo_trainer.py

docs/source/grpo_trainer.md

Co-authored-by: Sergio Paniego Blanco <[email protected]>

qgallouedec

LGTM, I let you change the name as suggested by @lewtun if you want to

…o vllm-colocated-sleep

edbeeching added 3 commits August 28, 2025 12:38

add sleep mode for GRPO

c9e02bd

doc tip

44f1533

precommit

654e97d

edbeeching requested review from lewtun, qgallouedec and kashif August 28, 2025 12:53

toslali-ibm reviewed Aug 28, 2025

View reviewed changes

lewtun reviewed Aug 28, 2025

View reviewed changes

docs/source/grpo_trainer.md Outdated Show resolved Hide resolved

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

sergiopaniego reviewed Aug 28, 2025

View reviewed changes

docs/source/grpo_trainer.md Outdated Show resolved Hide resolved

docs/source/grpo_trainer.md Outdated Show resolved Hide resolved

edbeeching changed the title ~~[GRPO] Adds an option to sleep vllm with running in colocated mode~~ [GRPO] Adds an option to sleep vllm when running in colocated mode Aug 28, 2025

qgallouedec and others added 4 commits August 29, 2025 01:14

style

072f40c

Add vLLM sleep mode documentation and reference in GRPO Trainer tips

ab22c8d

Update docs/source/grpo_trainer.md

6763031

Co-authored-by: Sergio Paniego Blanco <[email protected]>

Update docs/source/grpo_trainer.md

e55b88a

Co-authored-by: Sergio Paniego Blanco <[email protected]>

qgallouedec approved these changes Aug 29, 2025

View reviewed changes

qgallouedec and others added 5 commits August 29, 2025 01:18

remove nested tip

adb4736

Fix link to vLLM sleep mode documentation in GRPO Trainer tips

415d187

nits

78f9d60

revert change to level

8c67f8e

Merge branch 'vllm-colocated-sleep' of github.com:huggingface/trl int…

6e4acbc

…o vllm-colocated-sleep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GRPO] Adds an option to sleep vllm when running in colocated mode #3968

[GRPO] Adds an option to sleep vllm when running in colocated mode #3968

edbeeching commented Aug 28, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2025

Uh oh!

toslali-ibm Aug 28, 2025

Uh oh!

edbeeching Aug 28, 2025

Uh oh!

toslali-ibm Aug 28, 2025 •

edited

Loading

Uh oh!

edbeeching Aug 29, 2025

Uh oh!

lewtun left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

[GRPO] Adds an option to sleep vllm when running in colocated mode #3968

Are you sure you want to change the base?

[GRPO] Adds an option to sleep vllm when running in colocated mode #3968

Conversation

edbeeching commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2025

Uh oh!

toslali-ibm Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

edbeeching Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

toslali-ibm Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edbeeching Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

edbeeching commented Aug 28, 2025 •

edited

Loading

toslali-ibm Aug 28, 2025 •

edited

Loading