-
Couldn't load subscription status.
- Fork 523
[MagicMTP] New Feature. #3794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[MagicMTP] New Feature. #3794
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a 'block verify' algorithm for rejection sampling to improve performance. The changes are mainly in rejection_random_sample_pytorch.
My review identified two critical issues in the new implementation that could lead to runtime errors:
- A potential
UnboundLocalErrordue to an uninitialized variablelast_accepted_token_pos. - A potential
ZeroDivisionErrorwhen calculatingpiasdraft_probcan be zero.
I have provided code suggestions to fix both issues. Please review them carefully.
| pi = min(pi * target_prob / draft_prob, 1.0) | ||
|
|
||
| output_token_ids[req_idx, pos] = token_id | ||
| if draft_prob > 0 and pi >= uniform_prob: | ||
| last_accepted_token_pos = pos | ||
| rejected = False | ||
| else: | ||
| rejected = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential ZeroDivisionError at line 430. draft_prob can be 0, but it is used as a divisor before being checked. The division should be guarded by a check for draft_prob > 0.
if draft_prob > 0:
pi = min(pi * target_prob / draft_prob, 1.0)
else:
pi = 0.0
if pi >= uniform_prob:
last_accepted_token_pos = pos
rejected = False
else:
rejected = True|
MTP>=3时强制开启MagicMTP优化逻辑:块校验(Block Verify)。 |
Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: Aoxuan Chen <[email protected]> Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: Aoxuan Chen <[email protected]> Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: Aoxuan Chen <[email protected]> Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: Aoxuan Chen <[email protected]> Signed-off-by: chenaoxuan <[email protected]>
Signed-off-by: Aoxuan Chen <[email protected]>
What this PR does / why we need it?
MagicMTP Key Project: Introduce the 'block verify' algorithm into the draft token rejection sampling logic, taking into account the correlation between generated draft tokens to improve draft token acceptance rate and increase vllm_ascend runtime throughput.