[CB] fix scheduler constraint max tokens #332

sducouedic · 2025-07-24T12:25:36Z

The continuous batching scheduler checks that the request (ie. prompt + requested tokens) would fit the max_context_len, but didn't account for the fact that the last generated token is a "free" one that doesn't need to be stored in a block

Signed-off-by: Sophie du Couédic <[email protected]>

github-actions · 2025-07-24T12:34:41Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

yannicks1 · 2025-07-24T12:35:47Z

related to #320

fix scheduler free prefix token

a1cd3d4

Signed-off-by: Sophie du Couédic <[email protected]>

sducouedic requested review from yannicks1, tdoublep and nikolaospapandreou as code owners July 24, 2025 12:25

yannicks1 marked this pull request as draft July 24, 2025 12:35

sducouedic changed the title ~~[CB] fix scheduler free last token~~ [CB] fix scheduler constraint max tokens Jul 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CB] fix scheduler constraint max tokens #332

[CB] fix scheduler constraint max tokens #332

Uh oh!

sducouedic commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

yannicks1 commented Jul 24, 2025

Uh oh!

Uh oh!

[CB] fix scheduler constraint max tokens #332

Are you sure you want to change the base?

[CB] fix scheduler constraint max tokens #332

Uh oh!

Conversation

sducouedic commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

yannicks1 commented Jul 24, 2025

Uh oh!

Uh oh!