-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
N/A
🐛 Describe the bug
While running a V1 server with --scheduling-policy priority
, I sometimes run into this error, causing the server to crash:
(I've edited the code to give a better assertion error message)
# Since some requests in the RUNNING queue may not be scheduled in
# this step, the total number of scheduled requests can be smaller than
# len(self.running).
> assert (len(scheduled_new_reqs) + len(scheduled_resumed_reqs) +
len(scheduled_running_reqs) <= len(self.running)), (
f"scheduled_new_reqs: {len(scheduled_new_reqs)} + "
f"scheduled_resumed_reqs: {len(scheduled_resumed_reqs)} + "
f"scheduled_running_reqs: {len(scheduled_running_reqs)} > "
f"running: {len(self.running)}")
E AssertionError: scheduled_new_reqs: 0 + scheduled_resumed_reqs: 0 + scheduled_running_reqs: 2 > running: 1
../vllm-internal/vllm/v1/core/sched/scheduler.py:558: AssertionError
I've provided a repro here: #23346
The root cause seems to be that while scheduling running requests, requests are processed by the order of priority, so it's possible to schedule a request, then later be preempted by another request within the same scheduling cycle.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working