Skip to content

Conversation

@Pz1116
Copy link
Contributor

@Pz1116 Pz1116 commented Oct 25, 2025

What this PR does / why we need it?

Add kv pool developer guide

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 25, 2025
Pz1116 and others added 2 commits October 25, 2025 14:17
add guide

Signed-off-by: Pz1116 <[email protected]>
Signed-off-by: Pz1116 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new developer guide for the KV Cache Pool feature. The guide provides a good overview, but there are a few critical issues that need to be addressed for clarity and correctness. Specifically, there's a contradiction regarding supported storage types (HBM, DRAM, SSD vs. only DRAM), an incomplete sentence that leaves instructions on how to enable a feature unfinished, and a potentially confusing link to an external guide that uses a contradictory flag. Addressing these points will significantly improve the quality and usability of the documentation.


However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage.

Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This section and line 24 state that the KV Cache Pool supports HBM, DRAM, and SSD. However, the limitation section on line 57 says that it currently only supports DRAM. This is a significant contradiction and can be misleading to users. Please clarify the current support and the future roadmap for storage tiers to avoid confusion.


vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.

For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This link points to an external repository (vllm-ascend), which is not ideal for maintainability. Please consider using a relative path if the file is intended to be part of this repository.

Additionally, the linked guide uses the --no_enable_prefix_caching flag, which seems to contradict the feature's description of combining the KV Cache Pool with HBM prefix caching. This is very confusing. The documentation should clarify under which circumstances prefix caching should be disabled and explain why the example uses this flag.


### Combining KV Cache Pool with HBM Prefix Caching
Prefix Caching with HBM is already supported by the vLLM V1 Engine.
By introducing KV Connector V1, users can seamlessly combine HBM-based Prefix Caching with Mooncake-backed KV Pool. The user can enable both features simply by enabling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This sentence is incomplete, which can be very confusing for users trying to enable this feature. Please complete the sentence to explain what users need to enable. For example, you could mention the necessary command-line flags and provide a configuration snippet.

Signed-off-by: pz1116 <[email protected]>
Signed-off-by: Pz1116 <[email protected]>
@Pz1116 Pz1116 changed the title [WIP][docs] Add kv pool developer guide [docs] Add kv pool developer guide Oct 28, 2025

vLLM Ascend currently supports Mooncake (https://github.com/kvcache-ai/Mooncake): one of the most recognized KV Cache storage engine;

While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache, we find it would be better to integrate a connector that directly supports mooncake store and can utilize the data transfer strategy to one that is best fit to Huawei NPU hardware. Hence, we propose to integrate Mooncake Store with a brand new Mooncake Connector V1, which is indeed largly inspired by LMCache Connector V1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache, we find it would be better to

So there also exists a way for users to enable mooncake without our new Mooncake Connector V1? Maybe it's better to also provide the related guiding link for this way or using recommendations?


vLLM Ascend currently supports Mooncake (https://github.com/kvcache-ai/Mooncake): one of the most recognized KV Cache storage engine;

While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache, we find it would be better to integrate a connector that directly supports mooncake store and can utilize the data transfer strategy to one that is best fit to Huawei NPU hardware. Hence, we propose to integrate Mooncake Store with a brand new Mooncake Connector V1, which is indeed largly inspired by LMCache Connector V1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a brand new Mooncake Connector V1, which is indeed largly inspired by LMCache Connector V1.

If "largly inspired", maybe it's better to tell users the similarityies and differences.


## Usage

vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage part looks a bit limited to me. Perhaps it would be better to enrich this part. For example, you can put the most important configuration here, such as how to config kv-transfer-config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants