Skip to content

Commit 50bf059

Browse files
LouieYangAkshat-Tripathi
authored andcommitted
[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (vllm-project#14031)
1 parent c3eca7b commit 50bf059

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

vllm/model_executor/models/qwen2_5_vl.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -323,7 +323,8 @@ def forward(
323323

324324
seqlens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
325325
attn_bias = BlockDiagonalMask.from_seqlens(q_seqlen=seqlens,
326-
kv_seqlen=None)
326+
kv_seqlen=None,
327+
device=q.device)
327328

328329
context_layer = xops.memory_efficient_attention_forward(
329330
q, k, v, attn_bias=attn_bias, p=0, scale=None)

vllm/model_executor/models/qwen2_vl.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,8 @@ def forward(
367367

368368
seqlens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
369369
attn_bias = BlockDiagonalMask.from_seqlens(q_seqlen=seqlens,
370-
kv_seqlen=None)
370+
kv_seqlen=None,
371+
device=q.device)
371372

372373
context_layer = xops.memory_efficient_attention_forward(
373374
q, k, v, attn_bias=attn_bias, p=0, scale=None)

0 commit comments

Comments
 (0)