[docs] Add kv pool developer guide #3752

Pz1116 · 2025-10-25T06:16:41Z

What this PR does / why we need it?

Add kv pool developer guide

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@releases/v0.11.1

github-actions · 2025-10-25T06:16:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

add guide Signed-off-by: Pz1116 <[email protected]>

Signed-off-by: Pz1116 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a new developer guide for the KV Cache Pool feature. The guide provides a good overview, but there are a few critical issues that need to be addressed for clarity and correctness. Specifically, there's a contradiction regarding supported storage types (HBM, DRAM, SSD vs. only DRAM), an incomplete sentence that leaves instructions on how to enable a feature unfinished, and a potentially confusing link to an external guide that uses a contradictory flag. Addressing these points will significantly improve the quality and usability of the documentation.

gemini-code-assist · 2025-10-25T06:18:04Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage. 
+
+Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests.


This section and line 24 state that the KV Cache Pool supports HBM, DRAM, and SSD. However, the limitation section on line 57 says that it currently only supports DRAM. This is a significant contradiction and can be misleading to users. Please clarify the current support and the future roadmap for storage tiers to avoid confusion.

gemini-code-assist · 2025-10-25T06:18:04Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector. 
+
+For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md


This link points to an external repository (vllm-ascend), which is not ideal for maintainability. Please consider using a relative path if the file is intended to be part of this repository.

Additionally, the linked guide uses the --no_enable_prefix_caching flag, which seems to contradict the feature's description of combining the KV Cache Pool with HBM prefix caching. This is very confusing. The documentation should clarify under which circumstances prefix caching should be disabled and explain why the example uses this flag.

gemini-code-assist · 2025-10-25T06:18:04Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+### Combining KV Cache Pool with HBM Prefix Caching 
+Prefix Caching with HBM is already supported by the vLLM V1 Engine.
+By introducing KV Connector V1, users can seamlessly combine HBM-based Prefix Caching with Mooncake-backed KV Pool. The user can enable both features simply by enabling 


This sentence is incomplete, which can be very confusing for users trying to enable this feature. Please complete the sentence to explain what users need to enable. For example, you could mention the necessary command-line flags and provide a configuration snippet.

Signed-off-by: pz1116 <[email protected]>

Signed-off-by: Pz1116 <[email protected]>

whx-sjtu · 2025-10-29T09:23:23Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+vLLM Ascend currently supports Mooncake (https://github.com/kvcache-ai/Mooncake): one of the most recognized KV Cache storage engine;
+
+While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache, we find it would be better to integrate a connector that directly supports mooncake store and can utilize the data transfer strategy to one that is best fit to Huawei NPU hardware. Hence, we propose to integrate Mooncake Store with a brand new Mooncake Connector V1, which is indeed largly inspired by LMCache Connector V1.


While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache, we find it would be better to

So there also exists a way for users to enable mooncake without our new Mooncake Connector V1? Maybe it's better to also provide the related guiding link for this way or using recommendations?

whx-sjtu · 2025-10-29T09:28:28Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+vLLM Ascend currently supports Mooncake (https://github.com/kvcache-ai/Mooncake): one of the most recognized KV Cache storage engine;
+
+While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache, we find it would be better to integrate a connector that directly supports mooncake store and can utilize the data transfer strategy to one that is best fit to Huawei NPU hardware. Hence, we propose to integrate Mooncake Store with a brand new Mooncake Connector V1, which is indeed largly inspired by LMCache Connector V1.


a brand new Mooncake Connector V1, which is indeed largly inspired by LMCache Connector V1.

If "largly inspired", maybe it's better to tell users the similarityies and differences.

whx-sjtu · 2025-10-29T09:35:05Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+## Usage
+
+vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.


The usage part looks a bit limited to me. Perhaps it would be better to enrich this part. For example, you can put the most important configuration here, such as how to config kv-transfer-config

github-actions bot added the documentation Improvements or additions to documentation label Oct 25, 2025

Pz1116 and others added 2 commits October 25, 2025 14:17

Create KV_Cache_Pool_Guide.md

079dee0

add guide Signed-off-by: Pz1116 <[email protected]>

update

cbd7fe9

Signed-off-by: Pz1116 <[email protected]>

Pz1116 force-pushed the KV_POOL_MD branch from d6df942 to cbd7fe9 Compare October 25, 2025 06:17

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

Pz1116 added 2 commits October 25, 2025 18:53

update

6b820e2

Signed-off-by: pz1116 <[email protected]>

Add file to index

4b09d24

Signed-off-by: Pz1116 <[email protected]>

Pz1116 changed the title ~~[WIP][docs] Add kv pool developer guide~~ [docs] Add kv pool developer guide Oct 28, 2025

whx-sjtu suggested changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[docs] Add kv pool developer guide #3752

[docs] Add kv pool developer guide #3752

Pz1116 commented Oct 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

whx-sjtu Oct 29, 2025

Uh oh!

whx-sjtu Oct 29, 2025

Uh oh!

whx-sjtu Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage.

		Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests.


		vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.

		For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md


		## Usage

		vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.

[docs] Add kv pool developer guide #3752

Are you sure you want to change the base?

[docs] Add kv pool developer guide #3752

Conversation

Pz1116 commented Oct 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pz1116 commented Oct 25, 2025 •

edited by github-actions bot

Loading