Can i use NNCF to do INT4 AWQ on a ONNX/DirectML LLAMA2-7B model? #2621

Unanswered

vmadananth asked this question in General

vmadananth
Apr 9, 2024

I want to use NNCF to quantize LLAMA2-7B model which runs on ONNX and DirectML https://github.com/microsoft/Olive/tree/main/examples/directml/llama_v2.
Is this supported?

Replies: 1 comment

MaximProshin
Apr 15, 2024
Maintainer

@vmadananth , NNCF can compress llama2-7b, but as far as I understand you refer to some modified version of llama2 in Olive which we never tried. If your goal is to run this model on OpenVINO then I think you can try to convert it to OV IR first and then compress by NNCF. You can find some examples here (https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino) or in OV notebooks (https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot, https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering).
@ljaljushkin , @andreyanufr , @AlexKoff88 , please also have a look.

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment