Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

No description provided.

@WoosukKwon WoosukKwon changed the title [FlashInfer] Fix potential race condition for paged_kv_indptr_cpu [BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu Aug 27, 2025
@mergify mergify bot added the v1 label Aug 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes a potential race condition in the FlashInfer backend related to paged_kv_indptr_cpu. The race condition could occur during asynchronous data transfers to the GPU, especially when CUDA graphs are enabled. The implemented solution, which involves using an intermediate buffer (paged_kv_indptr_buffer) for the asynchronous copy, is a standard and effective way to resolve this kind of issue. The changes also include a minor optimization for calculating the number of actual pages. Overall, the fix is well-implemented and enhances the robustness of the attention mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant