[Frontend] Pass API server count to each process #23717

DarkLight1337 · 2025-08-27T06:55:50Z

Purpose

Follow-up to #23018.

By passing API server count and rank instead of setting cache size to 0, this PR enables processor caching when API server scale-out is enabled. IPC caching is still disabled for internal LB though since there is no 1:1 relationship between API server and Engine Core processes.

Also these changes are required for #22070.

Test Plan

~~Should we add an endpoint to query the API server count and rank just to test that these arguments are passed correctly?~~ Actually we already have /server_info for that, going to add a test.

cc @njhill

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces the necessary changes to pass the API server count and rank to each server process. This is a crucial step for enabling more sophisticated caching and resource management in a scaled-out API server environment. The changes are well-implemented across the configuration, argument parsing, and process management layers. Key changes include adding api_process_count and api_process_rank to ParallelConfig, updating APIServerProcessManager to handle per-server arguments, and correctly disabling incompatible features like IPC cache when multiple API servers are active. The refactoring in EngineArgs.from_cli_args also improves robustness. Overall, this is a solid contribution that enhances the multi-process architecture of vLLM.

Signed-off-by: DarkLight1337 <[email protected]>

hmellor · 2025-08-27T07:51:07Z

vllm/config/parallel.py

+    api_process_count: int = 1
+    """[Internal] The number of API processes initialized."""
+    api_process_rank: int = 0
+    """[Internal] The rank of this API process."""


Something like this to handle internal state which needs a public interface?

https://docs.python.org/3/library/dataclasses.html#dataclasses.InitVar

Suggested change

api_process_count: int = 1

"""[Internal] The number of API processes initialized."""

api_process_rank: int = 0

"""[Internal] The rank of this API process."""

_api_process_count: int

api_process_count: InitVar[int] = 1

"""The number of API processes initialized."""

_api_process_rank: int

api_process_rank: InitVar[int] = 0

"""The rank of this API process."""

...

def __post_init__(self, api_process_count, api_process_rank):

...

self._api_process_count = api_process_count

self._api_process_rank = api_process_rank

...

We still access those attributes as public attributes in our code. The "Internal" here refers to the fact that these are only supposed to be passed in CLI args via API server scale-out. Users should not set this flag

Something like this works (although mypy will complain about redefinition):

# example.py from dataclasses import InitVar, dataclass @dataclass class ParallelConfig: api_process_count: InitVar[int] = 1 def __post_init__(self, api_process_count: int): self._api_process_count = api_process_count @property def api_process_count(self) -> int: return self._api_process_count parallel_config = ParallelConfig(api_process_count=4) print(parallel_config.api_process_count) parallel_config.api_process_count = 2

$ python example.py 4 Traceback (most recent call last): File "/home/harry/vllm/demo.py", line 18, in <module> parallel_config.api_process_count = 2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: property 'api_process_count' of 'ParallelConfig' object has no setter

Users should not set this flag

users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion

I think this isn't worth the extra complexity, at least for now (especially since mypy doesn't even work with this)

Ok, I was just trying to think of robust ways to have config that can be set on init but not later

To avoid confusion, I have updated the docstring to be more clear that "internal" refers to how the CLI arg is passed, rather than its usage in the code

I think most config attributes (not just this one) shouldn't be modified after construction tbh. We should try to fix their values at initialization time but it would take some refactoring.

Signed-off-by: DarkLight1337 <[email protected]>

NickLucche

This makes sense to me, thanks! Let's wait for tests

NickLucche · 2025-08-27T08:15:41Z

vllm/config/parallel.py

+    api_process_count: int = 1
+    """[Internal] The number of API processes initialized."""
+    api_process_rank: int = 0
+    """[Internal] The rank of this API process."""


Users should not set this flag

users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-08-27T08:31:29Z

vllm/entrypoints/openai/api_server.py

-        server_info = {"vllm_config": str(raw_request.app.state.vllm_config)}
+    async def show_server_info(
+        raw_request: Request,
+        config_format: Annotated[Literal["text", "json"],


I much prefer full JSON format, but to avoid breaking BC, let's continue to use text format by default.

Signed-off-by: DarkLight1337 <[email protected]>

njhill · 2025-08-27T14:59:24Z

tests/v1/test_external_lb_dp.py

-    with ExternalLBServerManager(MODEL_NAME, DP_SIZE, api_server_count,
-                                 default_server_args) as server_list:
-        yield server_list
+    server_manager = ExternalLBServerManager(MODEL_NAME, DP_SIZE,


looks a bit weird to have the same variable name as method name here (any particular reason for introducing intermediate var here?)

It is because __enter__ returns the server list rather than the manager itself

I can change the return value of __enter__ if you want

njhill · 2025-08-27T14:59:44Z

tests/v1/test_hybrid_lb_dp.py

-                               default_server_args, DP_SIZE_LOCAL,
-                               TP_SIZE) as server_list:
-        yield server_list
+    server_manager = HybridLBServerManager(MODEL_NAME, DP_SIZE,


same comment

vllm/config/parallel.py

vllm/entrypoints/cli/serve.py

Signed-off-by: DarkLight1337 <[email protected]>

njhill

Thanks @DarkLight1337

I guess I've lost track of why we need the new parameters in the config for this? given we are already passing the count/index to these places (AFAICT)

njhill · 2025-08-27T16:31:35Z

vllm/v1/engine/core_client.py

@@ -772,6 +772,7 @@ def __init__(self,
            client_addresses=client_addresses,
        )

+        self.client_count = client_count


Is this used anywhere? or it's added to be used in future?

No, I just added this for consistency since client_index is being assigned

The new parameters are put in the config because the config is readily accessible in various parts of vLLM, which is needed for the next PR

yeah there's different instances of this behavior, I suppose at some point we could refactor this into a shared context (which isn't the forward one) to avoid abusing config.py changes.

Signed-off-by: DarkLight1337 <[email protected]>

vllm/engine/arg_utils.py

[Frontend] Pass API server count to each process

d52aa96

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

DarkLight1337 requested review from robertgshaw2-redhat, simon-mo, aarnphm, WoosukKwon, njhill, ywang96, comaniac, alexm-redhat, youkaichao, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners August 27, 2025 06:55

mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) v1 labels Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

Tests

5ff210d

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 removed the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

Update

ed76170

Signed-off-by: DarkLight1337 <[email protected]>

hmellor reviewed Aug 27, 2025

View reviewed changes

DarkLight1337 added 2 commits August 27, 2025 08:11

Update and fix tests

90703bd

Signed-off-by: DarkLight1337 <[email protected]>

Update docstring

3f97be4

Signed-off-by: DarkLight1337 <[email protected]>

NickLucche approved these changes Aug 27, 2025

View reviewed changes

Optimize

91ea959

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 commented Aug 27, 2025

View reviewed changes

DarkLight1337 added 2 commits August 27, 2025 08:32

Comment

6d0c040

Signed-off-by: DarkLight1337 <[email protected]>

Improve error message

69c9ff0

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

DarkLight1337 added 2 commits August 27, 2025 08:43

Update docstring

8e3ea32

Signed-off-by: DarkLight1337 <[email protected]>

Fixture

1dd5894

Signed-off-by: DarkLight1337 <[email protected]>

njhill reviewed Aug 27, 2025

View reviewed changes

DarkLight1337 added 10 commits August 27, 2025 15:51

Address comments in serve.py

d06434f

Signed-off-by: DarkLight1337 <[email protected]>

Rename attributes to internal and validate

dac1170

Signed-off-by: DarkLight1337 <[email protected]>

Fix

3f62e0e

Signed-off-by: DarkLight1337 <[email protected]>

Update

df9f9cb

Signed-off-by: DarkLight1337 <[email protected]>

Push down

0ec4e66

Signed-off-by: DarkLight1337 <[email protected]>

Update

e500a9b

Signed-off-by: DarkLight1337 <[email protected]>

Fix

36fb875

Signed-off-by: DarkLight1337 <[email protected]>

Try deepcopy

e08e7b7

Signed-off-by: DarkLight1337 <[email protected]>

No print

875c7e3

Signed-off-by: DarkLight1337 <[email protected]>

Simplify

d9a5c81

Signed-off-by: DarkLight1337 <[email protected]>

njhill reviewed Aug 27, 2025

View reviewed changes

DarkLight1337 added 3 commits August 27, 2025 16:39

Fix

dabe421

Signed-off-by: DarkLight1337 <[email protected]>

Update

fdc9b6e

Signed-off-by: DarkLight1337 <[email protected]>

Type checking

94ec51d

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 mentioned this pull request Aug 27, 2025

[Core] Enable HF processing on GPU #22070

Draft

4 tasks

Merge branch 'main' into api-server-count-cli

3b5db2d

hmellor reviewed Sep 1, 2025

View reviewed changes

vllm/engine/arg_utils.py Show resolved Hide resolved

-    api_process_count: int = 1
-    """[Internal] The number of API processes initialized."""
-    api_process_rank: int = 0
-    """[Internal] The rank of this API process."""
+    _api_process_count: int
+    api_process_count: InitVar[int] = 1
+    """The number of API processes initialized."""
+    _api_process_rank: int
+    api_process_rank: InitVar[int] = 0
+    """The rank of this API process."""
+    ...
+    def __post_init__(self, api_process_count, api_process_rank):
+        ...
+        self._api_process_count = api_process_count
+        self._api_process_rank = api_process_rank
+        ...

Uh oh!

[Frontend] Pass API server count to each process #23717

Are you sure you want to change the base?

[Frontend] Pass API server count to each process #23717

Conversation

DarkLight1337 commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 27, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading