Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jul 17, 2023

The NLDecoderEngine can now infer the dtype of the kv cache input from the onnx graph. This is necessary in order to enforce the adequate dtype when creating an initial kv cache arrays.

The PR is complementary to neuralmagic/sparseml#1648. Refer to that PR for manual tests description.

@bfineran
Copy link
Contributor

let's add test plan to the description

@dbogunowicz
Copy link
Contributor Author

@bfineran but the appropriate tests are laid out in detail in the sparseml counterpart.

@dbogunowicz dbogunowicz merged commit ad998df into main Jul 18, 2023
@dbogunowicz dbogunowicz deleted the feature/damian/enable_inference_w_quant_models branch July 18, 2023 13:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants