Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

natuan
Copy link
Contributor

@natuan natuan commented May 26, 2023

This change enables converting the torch.bmm with two quantized inputs and non-quantized output to MatMulInteger. This procedure is mutually exclusive with the existing one for QATMatMul-based quantized matmuls.

The attached graph shows two MatMulInteger that are the results of this conversion on OPT-125m.

after

Note that the quantization of these MatMuls on OPT requires this PR.

@natuan natuan requested review from bfineran, anmarques, dbogunowicz and a team May 26, 2023 16:33
dbogunowicz
dbogunowicz previously approved these changes May 29, 2023
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving tentatively, will test it shortly.

@dbogunowicz
Copy link
Contributor

Also, please update the PR description.

@natuan
Copy link
Contributor Author

natuan commented May 30, 2023

Also, please update the PR description.

Added the description

Copy link
Member

@anmarques anmarques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me after the fix

@natuan natuan merged commit 1575944 into main Jun 8, 2023
@natuan natuan deleted the ONNX_export_OPT_matmuls branch June 8, 2023 14:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants