-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Mark prompt logprobs as incompatible with prompt embeds at API level #25077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark prompt logprobs as incompatible with prompt embeds at API level #25077
Conversation
…evel Signed-off-by: Andrew Sansom <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a validation to prohibit the simultaneous use of prompt_logprobs and prompt_embeds in completion requests, addressing an incompatibility. The validation is implemented at two levels: at the API entrypoint in serving_completion.py to provide a clear error to the user, and at the engine level in llm_engine.py as a robust safeguard. A corresponding unit test has been added to verify that the API correctly rejects such requests. The changes are logical, well-tested, and effectively prevent the unsupported combination of parameters.
Signed-off-by: Andrew Sansom <[email protected]>
|
@DarkLight1337 is it normal for Readthedocs to take over 10 hours to complete? |
|
No, let me rebuild it |
…litPR into model_register * 'model_register' of https://github.com/dsxsteven/vllm_splitPR: (138 commits) Retrieve `sliding_window` from text config in Gemma3 MM (vllm-project#25085) [Docs] Fix API Reference (vllm-project#25140) [Kernel] Better inf handling for grouped topk cu (vllm-project#24886) [CLI] Use streaming in CLI chat and completion commands (vllm-project#23769) [benchmark] add peak throughput metrics and plot (vllm-project#23867) [Spec Decode] Efficient padded speculation (vllm-project#24539) [V0 Deprecation] Remove more V0 tests (vllm-project#25117) [EPLB] Add EPLB support for hunyuan_v1 (vllm-project#23078) [XPU] Whisper model support on XPU Platform (vllm-project#25123) Mark prompt logprobs as incompatible with prompt embeds at API level (vllm-project#25077) [Model] enable data parallel for InternVL vision encoder (vllm-project#23909) [Kernels] Overlap shared experts with combine instead of dispatch (vllm-project#24254) [Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (vllm-project#24960) [Core][MM] Cleanup `MultiModalCache` (vllm-project#25006) [Docs] Clean up the contributing README (vllm-project#25099) [MM Encoder] Apply DP ViT for Qwen3-VL model series (vllm-project#24955) [Kernels] Enable DeepGEMM by default (vllm-project#24462) [V0 Deprecation] Skip PP test (vllm-project#25128) [V0 Deprecation] Remove misc V0 tests (vllm-project#25118) [V0 Deprecation] Remove V0 Tracing & Metrics tests (vllm-project#25115) ...
…llm-project#25077) Signed-off-by: Andrew Sansom <[email protected]>
…llm-project#25077) Signed-off-by: Andrew Sansom <[email protected]>
…llm-project#25077) Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: charlifu <[email protected]>
…llm-project#25077) Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…llm-project#25077) Signed-off-by: Andrew Sansom <[email protected]>
…llm-project#25077) Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
The implementation of Prompt Logprobs is incompatible with Prompt Embeds in both the existing v0 engine and in #24278. In that PR is was requested by @DarkLight1337 to block requests with both prompt embeds and prompt logprobs at the API level.
This PR has changes originally in #24278, but since these fixes apply to both engines, and aren't strictly related to enabling prompt embedding support in the v1 engine, I figured it would be a good idea to split it into a separate PR for easier review and to reduce the diff in #24278.
Test Plan
Updated unit test with prompt embeds to explicitly ban requests that have both prompt embeds and prompt logprobs.
Test Result
New tests pass locally. Pending CI, it should be good to go. The full CI of the original PR #24278 never failed related to these changes.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.