Can i use NNCF to do INT4 AWQ on a ONNX/DirectML LLAMA2-7B model? #2621
Replies: 1 comment
-
| @vmadananth , NNCF can compress llama2-7b, but as far as I understand you refer to some modified version of llama2 in Olive which we never tried. If your goal is to run this model on OpenVINO then I think you can try to convert it to OV IR first and then compress by NNCF. You can find some examples here (https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino) or in OV notebooks (https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot, https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering). | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to use NNCF to quantize LLAMA2-7B model which runs on ONNX and DirectML https://github.com/microsoft/Olive/tree/main/examples/directml/llama_v2.
Is this supported?
Beta Was this translation helpful? Give feedback.
All reactions