Skip to content

Conversation

@alogfans
Copy link

@alogfans alogfans commented Nov 28, 2024

This PR is related to #10727, as well a continuation of PR #8498, which uses Mooncake's Transfer Engine for KVCache transfer instead of NCCL.

Mooncake is a KVCache-centric disaggregated architecture for LLM serving. Transfer Engine is the core component of Mooncake, see documentations for its design & API list.

Compared with NCCL, Mooncake Transfer Engine has the following features:

  • a unified programming interface for data transfers between DRAM-to-DRAM (both local and remote), DRAM-to-GPU VRAM (both local and remote), and DRAM-to-remote NVMe devices
  • support for TCP, RDMA, and NVMe-of protocols
  • topology-aware path selection (link to our english doc, transfer_engine.md), aggregating bandwidth from multiple NICs

Like the current implementation of PR #8498, there are two roles: KV provider (e.g. prefill vLLM instance) and KV consumer (e.g. decode vLLM instance)

  • Provider side implements insert: insert a KV cache into a buffer, so that it can be transferred upon request
  • Consumer side implements drop_select: select a KV cache based on tokens, transfer the selected KV, and drop this KV out from the buffer

Both roles are run in different machines.

Integration guide: https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm-integration.md

Benchmark result: https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm_benchmark_results.md

@mergify
Copy link

mergify bot commented Nov 29, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alogfans.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 29, 2024
@stmatengss stmatengss force-pushed the mooncake-integration-patch branch from 7132c75 to c1477fb Compare November 29, 2024 03:43
@mergify mergify bot removed the needs-rebase label Nov 29, 2024
@stmatengss
Copy link

Currently, this PR is based on the early version of #8498. We plan to clean up and rebase the code against the latest version soon. Apologies for triggering the request review prematurely.

@mergify
Copy link

mergify bot commented Dec 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alogfans.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 2, 2024
@KuntaiDu
Copy link
Collaborator

KuntaiDu commented Dec 2, 2024

The new version of disaggregated prefill PR #10502 is just merged, and feel free to continue development in vLLM's main branch! API-wise the new PR is pretty similar to the old PR so (hopefully) it is straightforward to migrate the implementation.

@junna2016
Copy link

junna2016 commented Dec 3, 2024

Can you provide a test example to run disaggregated prefill/decoding mode with MooncakeDistributedPipe scene?

@ShangmingCai
Copy link
Contributor

Can you provide a test example to run disaggregated prefill/decoding mode with MooncakeDistributedPipe scene?

You can refer to this doc to run a demo based on PR 8498. Currently, we are rebasing from the main branch. It is nearly done, but we will run more tests to ensure its compatibility.

@junna2016
Copy link

Can you provide a test example to run disaggregated prefill/decoding mode with MooncakeDistributedPipe scene?

You can refer to this doc to run a demo based on PR 8498. Currently, we are rebasing from the main branch. It is nearly done, but we will run more tests to ensure its compatibility.

Thanks a lot

@ShangmingCai
Copy link
Contributor

After rebase, we move the development to PR #10884 now.

@DarkLight1337
Copy link
Member

Closing as superseded by #10884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants