Skip to content

Conversation

nvjullin
Copy link
Contributor

@nvjullin nvjullin commented Aug 26, 2025

Purpose

Long term plan is being discussed at #22086, but it is not easy and will take some more time.
Meanwhile, add an override so we can benchmark the effect of tuning the thresholds and unblock people.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an environment variable to override allreduce fusion thresholds, which is a good feature for benchmarking. However, the implementation lacks robust error handling for the user-provided configuration. My review includes suggestions to prevent application crashes from malformed JSON or invalid data types in the environment variable, improving the overall robustness of this new feature.

@nvjullin nvjullin changed the title Add override for allreduce fusion thresholds [Misc] Add override for allreduce fusion thresholds Aug 26, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@mgoin mgoin enabled auto-merge (squash) August 26, 2025 13:53
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 26, 2025
# { <world size>: <max size in mb> }
# Unspecified world sizes will fallback to
# { 2: 64, 4: 1, <everything else>: 0.5 }
"VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this a CompilationConfig.PassConfig variable? It might be good to be tunable even after #22086 lands

_FI_MAX_SIZES.update({
int(k): int(float(v) * MiB)
for k, v in
envs.VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB.items()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reading this will only run at startup but we generally want to read envs during init in case they change between different LLM instantiations. Putting it in config and reading it at pass init time would be best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProExpertProg @ilmarkov Should we just replace fi_allreduce_fusion_max_token_num with something like fi_allreduce_fusion_thresholds_mb? Why do we need to check both num_tokens and message size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgoin mgoin merged commit 7ea22e4 into vllm-project:main Aug 26, 2025
45 of 46 checks passed
@ProExpertProg
Copy link
Collaborator

@nvjullin did not realize this was on automerge, can you address comments in a follow-up please?

tc-mb pushed a commit to tc-mb/vllm that referenced this pull request Aug 27, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025
2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025
nopperl pushed a commit to pfnet/vllm that referenced this pull request Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants