Flash attention no longer working in most recent build? #15650
Master-Pr0grammer
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I keep getting a "FlashAttention without tensor cores only supports head sizes 64 and 128." error before a seg fault when ever i try to run any gemma3 model on the most recent build.
I have a GTX 1080ti which I know is old and does not have tensor cores, however I was able to run this perfectly before updating. I was wondering if anyone had a similar experience and/or a fix that doesn't involve downgrading. Or maybe this is a bug? I wanted to ask before filing a bug report.
Beta Was this translation helpful? Give feedback.
All reactions