-
Notifications
You must be signed in to change notification settings - Fork 2.1k
TST: fix to issue for 8-bit model
#2797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yao, Matrix <[email protected]>
|
@BenjaminBossan , could you pls take a review? Thx very much. |
|
Thanks for the suggestion! I wonder if we should skip the test instead since we're "requesting" to use |
Signed-off-by: Yao, Matrix <[email protected]>
| weight = weight.to(torch.device("cuda")) | ||
| elif is_xpu_available(): | ||
| weight = weight.to(torch.device("xpu")) | ||
| state.SCB = state.SCB.to(torch.device("xpu")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight is moved to accelerator, but SCB not, will make later int8_vectorwise_dequant cry to say weight in xpu, but SCB in cpu and make CPU CI, fail, so move it to accelerator too.
|
|
||
| if is_cpu: | ||
| dequantized = dequantized.to(device) | ||
| state.SCB = state.SCB.to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both should be move back
@githubnemo, I took some time to dive into the issue and then cleaned the test cases, then the device cases passed, we have for Thx very much for review and sorry for before's lazy workaround. |
|
@githubnemo , could you pls take a review again, thx very much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this with 8bit bnb and device placement. Unfortunately, when I run this locally, I still get errors:
$ CUDA_VISIBLE_DEVICES=0 pytest tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit_dora tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit_iter_5 tests/test_gpu_examples.py::TestLoftQ::test_t5_loftq_8bit
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit[cpu] - assert tensor(nan, grad_fn=<MeanBackward0>) > 0.0
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit_dora[cpu] - RuntimeError: invalid argument to getCurrentStream
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit_iter_5[cuda] - torch.AcceleratorError: CUDA error: an illegal memory access was encountered
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit_iter_5[cpu] - torch.AcceleratorError: CUDA error: an illegal memory access was encountered
FAILED tests/test_gpu_examples.py::TestLoftQ::test_t5_loftq_8bit[cuda] - torch.AcceleratorError: CUDA error: an illegal memory access was encountered
FAILED tests/test_gpu_examples.py::TestLoftQ::test_t5_loftq_8bit[cpu] - torch.AcceleratorError: CUDA error: an illegal memory access was encountered
I cannot test on XPU, but for CPU and CUDA, I get these errors. Do these tests pass for you?
Furthermore, I was wondering why these tests are not running on CI and I think it's a simple oversight, there is no test marker for TestLoftQ. Could you please add the @pytest.mark.single_gpu_tests decorator?
| # The error factor indicates by how much the quantization error should be decreased when using LoftQ compared to | ||
| # quantization without LoftQ. Thus 1.03 means that the error should be decreased by 3% at least. This is a very | ||
| # conservative value to prevent flakiness, in practice most gains are > 1.5 | ||
| error_factor = 1.005 if device in ("xpu", "cpu") else 1.03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you find it necessary to reduce the factor to 1.005 for XPU and CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, it's needed. We are continuing enhance bnb support, when it works, we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I checked again with CUDA/CPU and the improvement is much larger than 1.03 (which, as indicated, is a very conservative value to avoid flakiness). So if XPU is below that, I think it's an indicator that something is missing.
@BenjaminBossan Could you try installing latest bnb with |
Signed-off-by: Yao, Matrix <[email protected]>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the 8bit bnb LoftQ tests and making LoftQ work with XPU, the PR LGTM.
The stuck CI is unrelated and can be ignored.
| # The error factor indicates by how much the quantization error should be decreased when using LoftQ compared to | ||
| # quantization without LoftQ. Thus 1.03 means that the error should be decreased by 3% at least. This is a very | ||
| # conservative value to prevent flakiness, in practice most gains are > 1.5 | ||
| error_factor = 1.005 if device in ("xpu", "cpu") else 1.03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I checked again with CUDA/CPU and the improvement is much larger than 1.03 (which, as indicated, is a very conservative value to avoid flakiness). So if XPU is below that, I think it's an indicator that something is missing.
This is to fix an oversight from huggingface#2797, where the LoftQ test was sligthly refactored but one test was not updated accordingly.
This is to fix an oversight from #2797, where the LoftQ test was sligthly refactored but one test was not updated accordingly.
Fixes some failing GPU tests in CI. A bug was introduced in huggingface#2797 where state.SCB was accessed while dequantizing 4bit bnb weights even though state is None. This would occur, for instance, when using DoRA, which needs to dequantize the weight. The attribute access is now restricted to 8bit bnb weights.
Fixes some failing GPU tests in CI. A bug was introduced in #2797 where state.SCB was accessed while dequantizing 4bit bnb weights even though state is None. This would occur, for instance, when using DoRA, which needs to dequantize the weight. The attribute access is now restricted to 8bit bnb weights.
Issue
4 cases fail
w/ log
Root Cause
8-bit models don't support
toandcudaaccording transformers logicFix
check whether model is loaded in 8 bit, and skip
toif so.Results:
Pass
@BenjaminBossan , pls help review, thx very much.