-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
The issue
LLC crashes as follows on an input attached below
llc: /home/kdrewnia/llvm-project/llvm/lib/CodeGen/SplitKit.cpp:1662: void llvm::SplitEditor::splitLiveThroughBlock(unsigned int, unsigned int, SlotIndex, unsigned int, SlotIndex): Assertion `(!LeaveBefore || Idx <= LeaveBefore) && "Interference"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: llc -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=+sramecc,-xnack ./reproducer.ll -o -
1. Running pass 'CallGraph Pass Manager' on module './reproducer.ll'.
2. Running pass 'Greedy Register Allocator' on function '@rock_gemm'
[...abort...]
#13 0x00000000034492f7 llvm::SplitEditor::splitLiveThroughBlock(unsigned int, unsigned int, llvm::SlotIndex, unsigned int, llvm::SlotIndex) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/SplitKit.cpp:1668:5
#14 0x00000000033a1630 llvm::RAGreedy::splitAroundRegion(llvm::LiveRangeEdit&, llvm::ArrayRef<unsigned int>) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:0:11
#15 0x00000000033a263d llvm::RAGreedy::doRegionSplit(llvm::LiveInterval const&, unsigned int, bool, llvm::SmallVectorImpl<llvm::Register>&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:0:3
#16 0x00000000033a1eff llvm::RAGreedy::tryRegionSplit(llvm::LiveInterval const&, llvm::AllocationOrder&, llvm::SmallVectorImpl<llvm::Register>&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:1093:1
#17 0x00000000033a6b01 llvm::RAGreedy::trySplit(llvm::LiveInterval const&, llvm::AllocationOrder&, llvm::SmallVectorImpl<llvm::Register>&, llvm::SmallSet<llvm::Register, 16u, std::less<llvm::Register>> const&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:1827:26
#18 0x00000000033a8ce5 llvm::RAGreedy::selectOrSplitImpl(llvm::LiveInterval const&, llvm::SmallVectorImpl<llvm::Register>&, llvm::SmallSet<llvm::Register, 16u, std::less<llvm::Register>>&, llvm::SmallVector<std::pair<llvm::LiveInterval const*, llvm::MCRegister>, 8u>&, unsigned int) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:2476:24
#19 0x00000000033a9337 llvm::RAGreedy::selectOrSplit(llvm::LiveInterval const&, llvm::SmallVectorImpl<llvm::Register>&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:2151:7
#20 0x000000000337bd85 llvm::RegAllocBase::allocatePhysRegs() /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocBase.cpp:114:9
#21 0x00000000033ad3cd llvm::RAGreedy::runOnMachineFunction(llvm::MachineFunction&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:2772:3
[...]
A git bisect run isolated this crash to only happening after #74467 .
While full reproduction information and variant inputs/settings that do or don't cause the crash to occur are provided below, I can report that the flag -amdgpu-codegenprepare-disable-idiv-expansion=true removes the failure.
Reproduction files
All of these files are opt -O3 -mtriple=amdgcn-amd-amdhsa output.
I apologize in advance for the lack of a smaller test case, as bugpoint didn't have much luck with this one.
reproducer.ll.txt is the input that triggers the crash. It is a matrix multiplication implementation.
fewer-batches-passing.ll.txt is that same code but with a lower batch size specified. That is, the input IR was identical to the failing case, but the statically-known (and annotated as a !range) number of workgroups differed between these two files.
In relevant part, the diff between those two files is
--- reproducer.ll 2024-04-04 21:13:02.778679418 +0000
+++ fewer-batches-passing.ll 2024-04-04 21:14:50.335567529 +0000
@@ -5,29 +5,28 @@ target datalayout = "e-p:64:64-p1:64:64- @__wg_rock_gemm_0 = internal unnamed_addr addrspace(3) global [8192 x i8] undef, align 64 @__wg_rock_gemm_1 = internal unnamed_addr addrspace(3) global [8192 x i8] undef, align 64
-define amdgpu_kernel void @rock_gemm(ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(805306368) %0, ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(100663296) %1, ptr inreg noalias nocapture nofree noundef nonnull writeonly align 16 dereferenceable(301989888) %2) local_unnamed_addr #0 !reqd_work_group_size !0 {
+define amdgpu_kernel void @rock_gemm(ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(125829120) %0, ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(15728640) %1, ptr inreg noalias nocapture nofree noundef nonnull writeonly align 16 dereferenceable(47185920) %2) local_unnamed_addr #0 !reqd_work_group_size !0 {
.preheader21.preheader:
%3 = tail call i32 @llvm.amdgcn.workgroup.id.x(), !range !1
%.fr = freeze i32 %3
- %.lhs.trunc = trunc i32 %.fr to i16
- %4 = udiv i16 %.lhs.trunc, 24
- %5 = mul i16 %4, 24
- %.decomposed = sub i16 %.lhs.trunc, %5
- %.zext17 = zext nneg i16 %.decomposed to i32
- %.cmp = icmp ugt i16 %.decomposed, 21
+ %.lhs.trunc = trunc i32 %.fr to i8
+ %4 = udiv i8 %.lhs.trunc, 24
+ %5 = mul i8 %4, 24
+ %.decomposed = sub i8 %.lhs.trunc, %5
+ %.zext17 = zext nneg i8 %.decomposed to i32
+ %.cmp = icmp ugt i8 %.decomposed, 21
%6 = select i1 %.cmp, i32 11, i32 0
%7 = sub nuw nsw i32 12, %6
%8 = tail call i32 @llvm.umin.i32(i32 %7, i32 11)
- %.lhs.trunc18 = trunc i16 %.decomposed to i8
%.rhs.trunc = trunc i32 %8 to i8
- %9 = urem i8 %.lhs.trunc18, %.rhs.trunc
+ %9 = urem i8 %.decomposed, %.rhs.trunc
@@ -1633,7 +1632,7 @@ attributes #4 = { convergent mustprogres
attributes #5 = { nounwind }
!0 = !{i32 256, i32 1, i32 1}
-!1 = !{i32 0, i32 1536}
+!1 = !{i32 0, i32 240}
!2 = !{i32 0, i32 256}
!3 = !{}
!4 = !{!5}
reproducer-barriers-removed.ll.txt is reproducer.ll with the call void asm statements removed. This variant also does not crash.
Steps to reproduce
(The -mattr inputs are kept to mach the original source of the bug)
llc -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=+sramecc,-xnack ./reproducer.ll
This will crash as seen above.
However,
llc -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=+sramecc,-xnack ./reproducer.ll -amdgpu-codegenprepare-disable-idiv-expansion=true
will not crash
Similarly, replacing reproducer.ll with either of the two variant files will not trigger the bug.
(Finally, adding -global-isel will also avoid the crash)