-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
[Frontend] Pass API server count to each process #23717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Frontend] Pass API server count to each process #23717
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the necessary changes to pass the API server count and rank to each server process. This is a crucial step for enabling more sophisticated caching and resource management in a scaled-out API server environment. The changes are well-implemented across the configuration, argument parsing, and process management layers. Key changes include adding api_process_count
and api_process_rank
to ParallelConfig
, updating APIServerProcessManager
to handle per-server arguments, and correctly disabling incompatible features like IPC cache when multiple API servers are active. The refactoring in EngineArgs.from_cli_args
also improves robustness. Overall, this is a solid contribution that enhances the multi-process architecture of vLLM.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
vllm/config/parallel.py
Outdated
api_process_count: int = 1 | ||
"""[Internal] The number of API processes initialized.""" | ||
api_process_rank: int = 0 | ||
"""[Internal] The rank of this API process.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this to handle internal state which needs a public interface?
https://docs.python.org/3/library/dataclasses.html#dataclasses.InitVar
api_process_count: int = 1 | |
"""[Internal] The number of API processes initialized.""" | |
api_process_rank: int = 0 | |
"""[Internal] The rank of this API process.""" | |
_api_process_count: int | |
api_process_count: InitVar[int] = 1 | |
"""The number of API processes initialized.""" | |
_api_process_rank: int | |
api_process_rank: InitVar[int] = 0 | |
"""The rank of this API process.""" | |
... | |
def __post_init__(self, api_process_count, api_process_rank): | |
... | |
self._api_process_count = api_process_count | |
self._api_process_rank = api_process_rank | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still access those attributes as public attributes in our code. The "Internal" here refers to the fact that these are only supposed to be passed in CLI args via API server scale-out. Users should not set this flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this works (although mypy will complain about redefinition):
# example.py
from dataclasses import InitVar, dataclass
@dataclass
class ParallelConfig:
api_process_count: InitVar[int] = 1
def __post_init__(self, api_process_count: int):
self._api_process_count = api_process_count
@property
def api_process_count(self) -> int:
return self._api_process_count
parallel_config = ParallelConfig(api_process_count=4)
print(parallel_config.api_process_count)
parallel_config.api_process_count = 2
$ python example.py
4
Traceback (most recent call last):
File "/home/harry/vllm/demo.py", line 18, in <module>
parallel_config.api_process_count = 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: property 'api_process_count' of 'ParallelConfig' object has no setter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should not set this flag
users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't worth the extra complexity, at least for now (especially since mypy
doesn't even work with this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was just trying to think of robust ways to have config that can be set on init but not later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion, I have updated the docstring to be more clear that "internal" refers to how the CLI arg is passed, rather than its usage in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most config attributes (not just this one) shouldn't be modified after construction tbh. We should try to fix their values at initialization time but it would take some refactoring.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, thanks! Let's wait for tests
vllm/config/parallel.py
Outdated
api_process_count: int = 1 | ||
"""[Internal] The number of API processes initialized.""" | ||
api_process_rank: int = 0 | ||
"""[Internal] The rank of this API process.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should not set this flag
users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion
Signed-off-by: DarkLight1337 <[email protected]>
server_info = {"vllm_config": str(raw_request.app.state.vllm_config)} | ||
async def show_server_info( | ||
raw_request: Request, | ||
config_format: Annotated[Literal["text", "json"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I much prefer full JSON format, but to avoid breaking BC, let's continue to use text format by default.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
with ExternalLBServerManager(MODEL_NAME, DP_SIZE, api_server_count, | ||
default_server_args) as server_list: | ||
yield server_list | ||
server_manager = ExternalLBServerManager(MODEL_NAME, DP_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks a bit weird to have the same variable name as method name here (any particular reason for introducing intermediate var here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because __enter__
returns the server list rather than the manager itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the return value of __enter__
if you want
default_server_args, DP_SIZE_LOCAL, | ||
TP_SIZE) as server_list: | ||
yield server_list | ||
server_manager = HybridLBServerManager(MODEL_NAME, DP_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DarkLight1337
I guess I've lost track of why we need the new parameters in the config for this? given we are already passing the count/index to these places (AFAICT)
@@ -772,6 +772,7 @@ def __init__(self, | |||
client_addresses=client_addresses, | |||
) | |||
|
|||
self.client_count = client_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere? or it's added to be used in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just added this for consistency since client_index
is being assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new parameters are put in the config because the config is readily accessible in various parts of vLLM, which is needed for the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there's different instances of this behavior, I suppose at some point we could refactor this into a shared context (which isn't the forward one) to avoid abusing config.py changes.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Purpose
Follow-up to #23018.
By passing API server count and rank instead of setting cache size to 0, this PR enables processor caching when API server scale-out is enabled. IPC caching is still disabled for internal LB though since there is no 1:1 relationship between API server and Engine Core processes.
Also these changes are required for #22070.
Test Plan
Should we add an endpoint to query the API server count and rank just to test that these arguments are passed correctly?Actually we already have/server_info
for that, going to add a test.cc @njhill
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.