Skip to content

Conversation

nvjullin
Copy link
Contributor

@nvjullin nvjullin commented Aug 27, 2025

Purpose

Follow up on #23639.
Also cleaned up two competing/conflicting ways of tuning thresholds: number of tokens vs size.
Size is the relevant parameter (for perf), so we should only use that.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Lin <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the configuration for allreduce fusion thresholds, moving them from an environment variable to a more structured configuration object. The cleanup of the logic for tuning thresholds is also a welcome improvement. I've found one potential performance issue in the calculation of max_token_num which appears to be overly conservative and could prevent the fused kernel from being used in some cases where it would be beneficial. Please see the detailed comment.

@ilmarkov
Copy link
Contributor

I am changing the constants and a bit of logic in the other PR. But keeping the max size and cleaning the other tuning ways make sense to me.
LGTM.

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we could restructure this such that the defaults are also reflected in config that would be nice. So maybe config asks the pass for. defaults but uses CLI values with precedence.

@hmellor
Copy link
Member

hmellor commented Aug 27, 2025

I agree it would be nice if the the defaults could be the default of the actual config field rather than living with the implementation

@nvjullin
Copy link
Contributor Author

nvjullin commented Aug 28, 2025

I agree it would be nice if the the defaults could be the default of the actual config field

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, "4": 1, "6": 1, "8": 8}. This is quite bad UI.

I think if we could restructure this such that the defaults are also reflected in config that would be nice.

Right now, the comment explains the defaults, so it is indeed reflected in the config. The issue is that the comment has to be in sync with the implementation. It's not ideal, but otherwise we'll have to write a new dict-like class to handle the aforementioned UI problem which I think is overkill for a very niche config option.

Another option is to have a default of {"2": 64, "4": 1, "6": 1, "8": 1} in config and fall back to the one in flashinfer_max_size when the config is empty. This is essentially the same as the current situation where we have a comment explaining the default: we still have to keep them in sync.

@ilmarkov
Copy link
Contributor

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, >"4": 1, "6": 1, "8": 8}

@nvjullin I'd suggest to update the default config with user-provided dictionary. I believe user usually needs to specify one key:value pair at the initialization to update the default config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

4 participants