Skip to content

Conversation

edbeeching
Copy link
Collaborator

@edbeeching edbeeching commented Aug 28, 2025

What does this PR do?

This PR adds an option which using vllm colocate model to offload vllm's weights and cache to CPU memory during the optimization step, freeing up more ram for model activations, etc.

The current implementation is as follows:
image

With vllm_sleep_enabled=True:
image

This option was mentioned in https://huggingface.co/blog/vllm-colocate#sleep-mode-in-vllm but has not been upstreamed to TRL yet.

Comparison of training curves with / without this option:

image

Comparable loss / reward curves, but slower training due to sleep / wake.

In a future PR, we can potentially add CPU offloading to the model / optimizer. So vllm has access to more memory for the generation step.

Memory benchmarks for max seq length in the two configurations:

(TODO)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

)
if self.args.vllm_sleep_enabled:
self.llm.sleep(level=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time vllm wakes up, it wakes up to an updated model, right? So why not use level = 2 to further improve efficiency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, let me rerun the benchmark with level=2.

Copy link
Contributor

@toslali-ibm toslali-ibm Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For level = 2, you will also need to wake-up then sleep in _move_model_to_vllm, _sync_fsdp2_params_to_vllm and _sync_fsdp1_params_to_vllm.

BAsically anytime you touch vllm (in generation or loading/syncying the model), you wake up, do the work, and then go back to sleep.

Additional note: the reason we did not move sleep to upstream was because of a vllm bug, which is just recently fixed by this PR. So that means, you need to use vllm version that incorporates the fix to be able to use sleep level 2 without segmentation fault.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have already included a wake up call before the weight sync. I don't re-sleep as we have the gen step straight after.

It looks like the level 2 fix has not made it into their most recent release. I will leave level=1 for now for better backward compatability. Unless there are other reasons to go with level 2?

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the comment from @toslali-ibm is tested

@edbeeching edbeeching changed the title [GRPO] Adds an option to sleep vllm with running in colocated mode [GRPO] Adds an option to sleep vllm when running in colocated mode Aug 28, 2025
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I let you change the name as suggested by @lewtun if you want to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants