-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[Bugfix] Fix for V1 priority scheduling crashes at preemption #23713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a crash in priority scheduling during preemption by updating the scheduled_running_reqs
list. The fix is correct in principle, and the accompanying test case is a valuable addition for verifying this scenario. However, the implementation of the fix in vllm/v1/core/sched/scheduler.py
has a flaw that could lead to a ValueError
, as it attempts to remove an item from a list without first verifying its presence. This could cause a crash under certain conditions, which I've detailed in a specific comment.
vllm/v1/core/sched/scheduler.py
Outdated
@@ -253,6 +253,7 @@ def schedule(self) -> SchedulerOutput: | |||
key=lambda r: (r.priority, r.arrival_time), | |||
) | |||
self.running.remove(preempted_req) | |||
scheduled_running_reqs.remove(preempted_req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can cause a ValueError
if preempted_req
has not been scheduled yet in the current schedule()
call, and is therefore not present in scheduled_running_reqs
. This can happen if a request being processed triggers its own preemption, or preempts another request that hasn't been processed yet in the current scheduling iteration.
To prevent a crash, you should check if the request is in the list before attempting to remove it.
scheduled_running_reqs.remove(preempted_req) | |
if preempted_req in scheduled_running_reqs: | |
scheduled_running_reqs.remove(preempted_req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hanchenli I think this make sense. Can you double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heheda12345 Hi, I think this makes sense. I just updated.
Signed-off-by: Hanchenli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix.
…roject#23713) Signed-off-by: Hanchenli <[email protected]>
…roject#23713) Signed-off-by: Hanchenli <[email protected]>
…roject#23713) Signed-off-by: Hanchenli <[email protected]>
fix #23347
Purpose
This PR is to fix for #23347. The bug is due to that when requests are evicted due to low priority and not enough KV space, scheduled_running_reqs are not updated. This PR adds the line to update scheduled_running_reqs and an example test case for scheduler from #23346 .
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.