Skip to content

Conversation

@Shirley125
Copy link
Contributor

@Shirley125 Shirley125 commented Oct 25, 2025

What this PR does / why we need it?

fix proxy decode bug parsing non-UTF-8 characters

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces error handling for decoding stream chunks in two proxy server examples. While adding a try-except block is a good step to prevent crashes from decoding errors, the implementation can be improved for robustness. Specifically, catching the generic Exception is too broad and can mask other potential issues. It's a best practice to catch the specific UnicodeDecodeError to make the error handling more precise and the code more maintainable.

@Shirley125 Shirley125 changed the title fix proxy decode bug [bugfix]fix proxy decode bug Oct 25, 2025
Pr0Wh1teGivee and others added 4 commits October 25, 2025 12:52
### What this PR does / why we need it?
1. Rename common_fused_moe.py to fused_moe.py.
2. Rename fused_moe_prepare_and_finalize.py / FusedMoEPrepareAndFinalize
to prepare_finalize.py / PrepareAndFinalize.
3. Rename vllm_ascend/ops/moe to vllm_ascend/ops/fused_moe.
4. Move vllm_ascend/ops/fused_moe.py to
vllm_ascend/ops/fused_moe/fused_moe.py
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
e2e & ut

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Signed-off-by: CHEN <[email protected]>
### What this PR does / why we need it?
Caps the calculated maximum number of tokens at 512.

This prevents allocating an excessively large buffer when a cudagraph
capture size is not specified, mitigating the risk of out-of-memory
errors.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a

Signed-off-by: Yizhou Liu <[email protected]>
Signed-off-by: CHEN <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants