-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[GRPO] Adds an option to sleep vllm when running in colocated mode #3968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
) | ||
if self.args.vllm_sleep_enabled: | ||
self.llm.sleep(level=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time vllm wakes up, it wakes up to an updated model, right? So why not use level = 2 to further improve efficiency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, let me rerun the benchmark with level=2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For level = 2, you will also need to wake-up then sleep in _move_model_to_vllm
, _sync_fsdp2_params_to_vllm
and _sync_fsdp1_params_to_vllm
.
BAsically anytime you touch vllm (in generation or loading/syncying the model), you wake up, do the work, and then go back to sleep.
Additional note: the reason we did not move sleep to upstream was because of a vllm bug, which is just recently fixed by this PR. So that means, you need to use vllm version that incorporates the fix to be able to use sleep level 2 without segmentation fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have already included a wake up call before the weight sync. I don't re-sleep as we have the gen step straight after.
It looks like the level 2 fix has not made it into their most recent release. I will leave level=1 for now for better backward compatability. Unless there are other reasons to go with level 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the comment from @toslali-ibm is tested
Co-authored-by: Sergio Paniego Blanco <[email protected]>
Co-authored-by: Sergio Paniego Blanco <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I let you change the name as suggested by @lewtun if you want to
What does this PR do?
This PR adds an option which using vllm colocate model to offload vllm's weights and cache to CPU memory during the optimization step, freeing up more ram for model activations, etc.
The current implementation is as follows:

With

vllm_sleep_enabled=True
:This option was mentioned in https://huggingface.co/blog/vllm-colocate#sleep-mode-in-vllm but has not been upstreamed to TRL yet.
Comparison of training curves with / without this option:
Comparable loss / reward curves, but slower training due to sleep / wake.
In a future PR, we can potentially add CPU offloading to the model / optimizer. So vllm has access to more memory for the generation step.
Memory benchmarks for max seq length in the two configurations:
(TODO)