Skip to content

[Feature]: Decouple max_batch_size from cuda_graph_batch_sizes #7675

@MrGeva

Description

@MrGeva

🚀 The feature, motivation and pitch

Currently we always capture cuda graphs up to the highest value of cuda_graph_batch_sizes.
if the user sets a higher max_batch_size it will fail on assertion in CapturedGraph because input batch size dim will be higher than the maximal cuda_graph_batch_sizes value. need to capture up to the max value of cuda_graph_batch_sizes and fallback to non-captured graph when higher batch size occurs.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

feature requestNew feature or request. This includes new model, dtype, functionality support

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions