Skip to content

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Aug 27, 2025

Purpose

Follow-up to #23018.

By passing API server count and rank instead of setting cache size to 0, this PR enables processor caching when API server scale-out is enabled. IPC caching is still disabled for internal LB though since there is no 1:1 relationship between API server and Engine Core processes.

Also these changes are required for #22070.

Test Plan

Should we add an endpoint to query the API server count and rank just to test that these arguments are passed correctly? Actually we already have /server_info for that, going to add a test.

cc @njhill

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
@mergify mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) v1 labels Aug 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the necessary changes to pass the API server count and rank to each server process. This is a crucial step for enabling more sophisticated caching and resource management in a scaled-out API server environment. The changes are well-implemented across the configuration, argument parsing, and process management layers. Key changes include adding api_process_count and api_process_rank to ParallelConfig, updating APIServerProcessManager to handle per-server arguments, and correctly disabling incompatible features like IPC cache when multiple API servers are active. The refactoring in EngineArgs.from_cli_args also improves robustness. Overall, this is a solid contribution that enhances the multi-process architecture of vLLM.

Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 removed the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
Signed-off-by: DarkLight1337 <[email protected]>
Comment on lines 100 to 103
api_process_count: int = 1
"""[Internal] The number of API processes initialized."""
api_process_rank: int = 0
"""[Internal] The rank of this API process."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this to handle internal state which needs a public interface?

https://docs.python.org/3/library/dataclasses.html#dataclasses.InitVar

Suggested change
api_process_count: int = 1
"""[Internal] The number of API processes initialized."""
api_process_rank: int = 0
"""[Internal] The rank of this API process."""
_api_process_count: int
api_process_count: InitVar[int] = 1
"""The number of API processes initialized."""
_api_process_rank: int
api_process_rank: InitVar[int] = 0
"""The rank of this API process."""
...
def __post_init__(self, api_process_count, api_process_rank):
...
self._api_process_count = api_process_count
self._api_process_rank = api_process_rank
...

Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still access those attributes as public attributes in our code. The "Internal" here refers to the fact that these are only supposed to be passed in CLI args via API server scale-out. Users should not set this flag

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this works (although mypy will complain about redefinition):

# example.py
from dataclasses import InitVar, dataclass


@dataclass
class ParallelConfig:
    api_process_count: InitVar[int] = 1

    def __post_init__(self, api_process_count: int):
        self._api_process_count = api_process_count

    @property
    def api_process_count(self) -> int:
        return self._api_process_count

parallel_config = ParallelConfig(api_process_count=4)
print(parallel_config.api_process_count)
parallel_config.api_process_count = 2
$ python example.py
4
Traceback (most recent call last):
  File "/home/harry/vllm/demo.py", line 18, in <module>
    parallel_config.api_process_count = 2
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: property 'api_process_count' of 'ParallelConfig' object has no setter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users should not set this flag

users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion

Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't worth the extra complexity, at least for now (especially since mypy doesn't even work with this)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was just trying to think of robust ways to have config that can be set on init but not later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, I have updated the docstring to be more clear that "internal" refers to how the CLI arg is passed, rather than its usage in the code

Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most config attributes (not just this one) shouldn't be modified after construction tbh. We should try to fix their values at initialization time but it would take some refactoring.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, thanks! Let's wait for tests

Comment on lines 100 to 103
api_process_count: int = 1
"""[Internal] The number of API processes initialized."""
api_process_rank: int = 0
"""[Internal] The rank of this API process."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users should not set this flag

users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion

Signed-off-by: DarkLight1337 <[email protected]>
server_info = {"vllm_config": str(raw_request.app.state.vllm_config)}
async def show_server_info(
raw_request: Request,
config_format: Annotated[Literal["text", "json"],
Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer full JSON format, but to avoid breaking BC, let's continue to use text format by default.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
with ExternalLBServerManager(MODEL_NAME, DP_SIZE, api_server_count,
default_server_args) as server_list:
yield server_list
server_manager = ExternalLBServerManager(MODEL_NAME, DP_SIZE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a bit weird to have the same variable name as method name here (any particular reason for introducing intermediate var here?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because __enter__ returns the server list rather than the manager itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the return value of __enter__ if you want

default_server_args, DP_SIZE_LOCAL,
TP_SIZE) as server_list:
yield server_list
server_manager = HybridLBServerManager(MODEL_NAME, DP_SIZE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DarkLight1337

I guess I've lost track of why we need the new parameters in the config for this? given we are already passing the count/index to these places (AFAICT)

@@ -772,6 +772,7 @@ def __init__(self,
client_addresses=client_addresses,
)

self.client_count = client_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere? or it's added to be used in future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just added this for consistency since client_index is being assigned

Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameters are put in the config because the config is readily accessible in various parts of vLLM, which is needed for the next PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there's different instances of this behavior, I suppose at some point we could refactor this into a shared context (which isn't the forward one) to avoid abusing config.py changes.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants