-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SCEV] Use or disjoint flag #74467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCEV] Use or disjoint flag #74467
Conversation
|
@llvm/pr-subscribers-llvm-analysis Author: Nikita Popov (nikic) ChangesUse the disjoint flag to convert or to add instead of calling the haveNoCommonBitsSet() ValueTracking query. This ensures that we can reliably undo add -> or canonicalization, even in cases where the necessary information has been lost or is too complex to reinfer in SCEV. I have updated the bulk of the test coverage to add the necessary disjoint flags in advance. Full diff: https://github.com/llvm/llvm-project/pull/74467.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 9670e5309b32f..451ae73dbd601 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -5223,11 +5223,8 @@ static std::optional<BinaryOp> MatchBinaryOp(Value *V, const DataLayout &DL,
return BinaryOp(Op);
case Instruction::Or: {
- // LLVM loves to convert `add` of operands with no common bits
- // into an `or`. But SCEV really doesn't deal with `or` that well,
- // so try extra hard to recognize this `or` as an `add`.
- if (haveNoCommonBitsSet(Op->getOperand(0), Op->getOperand(1),
- SimplifyQuery(DL, &DT, &AC, CxtI)))
+ // Convert or disjoint into add nuw nsw.
+ if (cast<PossiblyDisjointInst>(Op)->isDisjoint())
return BinaryOp(Instruction::Add, Op->getOperand(0), Op->getOperand(1),
/*IsNSW=*/true, /*IsNUW=*/true);
return BinaryOp(Op);
diff --git a/llvm/test/Analysis/ScalarEvolution/add-like-or.ll b/llvm/test/Analysis/ScalarEvolution/add-like-or.ll
index 38b6c44fdf356..86bb9058832b4 100644
--- a/llvm/test/Analysis/ScalarEvolution/add-like-or.ll
+++ b/llvm/test/Analysis/ScalarEvolution/add-like-or.ll
@@ -6,15 +6,49 @@ define i8 @or-of-constant-with-no-common-bits-set(i8 %x, i8 %y) {
; CHECK-NEXT: Classifying expressions for: @or-of-constant-with-no-common-bits-set
; CHECK-NEXT: %t0 = shl i8 %x, 2
; CHECK-NEXT: --> (4 * %x) U: [0,-3) S: [-128,125)
-; CHECK-NEXT: %r = or i8 %t0, 3
+; CHECK-NEXT: %r = or disjoint i8 %t0, 3
; CHECK-NEXT: --> (3 + (4 * %x))<nuw><nsw> U: [3,0) S: [-125,-128)
; CHECK-NEXT: Determining loop execution counts for: @or-of-constant-with-no-common-bits-set
;
%t0 = shl i8 %x, 2
- %r = or i8 %t0, 3
+ %r = or disjoint i8 %t0, 3
ret i8 %r
}
+define i8 @or-disjoint(i8 %x, i8 %y) {
+; CHECK-LABEL: 'or-disjoint'
+; CHECK-NEXT: Classifying expressions for: @or-disjoint
+; CHECK-NEXT: %or = or disjoint i8 %x, %y
+; CHECK-NEXT: --> (%x + %y) U: full-set S: full-set
+; CHECK-NEXT: Determining loop execution counts for: @or-disjoint
+;
+ %or = or disjoint i8 %x, %y
+ ret i8 %or
+}
+
+define i8 @or-no-disjoint(i8 %x, i8 %y) {
+; CHECK-LABEL: 'or-no-disjoint'
+; CHECK-NEXT: Classifying expressions for: @or-no-disjoint
+; CHECK-NEXT: %or = or i8 %x, %y
+; CHECK-NEXT: --> %or U: full-set S: full-set
+; CHECK-NEXT: Determining loop execution counts for: @or-no-disjoint
+;
+ %or = or i8 %x, %y
+ ret i8 %or
+}
+
+; FIXME: We could add nuw nsw flags here.
+define noundef i8 @or-disjoint-transfer-flags(i8 %x, i8 %y) {
+; CHECK-LABEL: 'or-disjoint-transfer-flags'
+; CHECK-NEXT: Classifying expressions for: @or-disjoint-transfer-flags
+; CHECK-NEXT: %or = or disjoint i8 %x, %y
+; CHECK-NEXT: --> (%x + %y) U: full-set S: full-set
+; CHECK-NEXT: Determining loop execution counts for: @or-disjoint-transfer-flags
+;
+ %or = or disjoint i8 %x, %y
+ ret i8 %or
+}
+
define void @mask-high(i64 %arg, ptr dereferenceable(4) %arg1) {
; CHECK-LABEL: 'mask-high'
; CHECK-NEXT: Classifying expressions for: @mask-high
@@ -24,7 +58,7 @@ define void @mask-high(i64 %arg, ptr dereferenceable(4) %arg1) {
; CHECK-NEXT: --> (sext i32 %i to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
; CHECK-NEXT: %i3 = and i64 %arg, -16
; CHECK-NEXT: --> (16 * (%arg /u 16))<nuw> U: [0,-15) S: [-9223372036854775808,9223372036854775793)
-; CHECK-NEXT: %i4 = or i64 1, %i3
+; CHECK-NEXT: %i4 = or disjoint i64 1, %i3
; CHECK-NEXT: --> (1 + (16 * (%arg /u 16))<nuw>)<nuw><nsw> U: [1,-14) S: [-9223372036854775807,9223372036854775794)
; CHECK-NEXT: %i7 = phi i64 [ %i4, %bb ], [ %i8, %bb6 ]
; CHECK-NEXT: --> {(1 + (16 * (%arg /u 16))<nuw>)<nuw><nsw>,+,1}<%bb6> U: full-set S: full-set Exits: ((sext i32 %i to i64) smax (1 + (16 * (%arg /u 16))<nuw>)<nuw><nsw>) LoopDispositions: { %bb6: Computable }
@@ -42,7 +76,7 @@ bb:
%i = load i32, ptr %arg1, align 4
%i2 = sext i32 %i to i64
%i3 = and i64 %arg, -16
- %i4 = or i64 1, %i3
+ %i4 = or disjoint i64 1, %i3
%i5 = icmp sgt i64 %i4, %i2
br i1 %i5, label %bb10, label %bb6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests to make sure we don't incorrectly add the nuw/nsw flags if poison doesn't imply UB, but found that we actually never transfer these. This is because the NSW/NUW flags set in MatchBinaryOp end up being ignored in favor of the ones on the original instruction. I'll fix that in a followup.
Use the disjoint flag to convert or to add instead of calling the haveNoCommonBitsSet() ValueTracking query. This ensures that we can reliably undo add -> or canonicalization, even in cases where the necessary information has been lost or is too complex to reinfer in SCEV. I have updated the bulk of the test coverage to add the necessary disjoint flags in advance.
cd90987 to
415236a
Compare
preames
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…n OR instructions Prior to patch (llvm#74467), LLVM used the haveNoCommonBitsSet() function to determine whether OR instructions had no common bits. If they didn't, the OR instructions were transformed into ADD instructions, allowing the LoadStoreVectorizer (LSV) to vectorize them. After the patch was applied, the implementation switched to using a disjoint flag to indicate this condition. However, these OR instructions were no longer recognized as disjoint and were not converted into ADD instructions. This oversight led to a regression where LSV failed to vectorize these instructions. This commit addresses the regression by updating LSV to identify if the OR instructions are disjoint, and if so, to transform the IR as follows to continue vectorization: Original IR: %linear_index3 = or i32 %linear_index_plus_base.fr, 3 %linear_index2 = or i32 %linear_index_plus_base.fr, 2 %linear_index1 = or i32 %linear_index_plus_base.fr, 1 Transformed IR: %linear_index3 = or disjoint i32 %linear_index_plus_base.fr, 3 %linear_index2 = or disjoint i32 %linear_index_plus_base.fr, 2 %linear_index1 = or disjoint i32 %linear_index_plus_base.fr, 1 This modification will enable LSV to recognize these instructions and restore vectorization opportunities.
Summary:
This commit resolves a regression in the LoadStoreVectorizer (LSV)
due to a previous change in how OR instructions were handled. The
regression prevented LSV from recognizing and vectorizing OR
instructions marked as disjoint.
Background:
A historical patch ([https://github.com/llvm/llvm-project/pull/74467](https://github.com/llvm/llvm-project/pull/74467))
shifted the identification of disjoint OR instructions from using
the haveNoCommonBitsSet() function to a disjoint flag within the
instruction. However, this change inadvertently led to missed
vectorization opportunities because LSV was no longer converting
disjoint OR instructions to ADD instructions, as it did and not
recognize the new flag.
This patch updates the Scalar Evolution (SCEV) analysis to identify
disjoint OR instructions based on the disjoint flag. By doing this,
the instructions will be correctly interpreted and the LSV can restore
lost vectorization opportunities.
Test plan:
Added a new phase-ordering test that confirms the regression at -O1,
-O2, and -O3 optimization levels, where the original OR instructions
were not identified as disjoint and thus not vectorized. The test
illustrates that with the new SCEV analysis, the transformed IR will
properly have disjoint OR instructions vectorized by LSV.
Details:
The modified test disjoint_or_vectorizer.ll showcases the restored
vectorization with the following transformed IR:
Original IR:
%linear_index3 = or i32 %linear_index_plus_base.fr, 3
%linear_index2 = or i32 %linear_index_plus_base, 2
%linear_index1 = or i32 %linear_index_plus_base, 1
Transformed IR with disjoint flag:
%linear_index3 = or disjoint i32 %linear_index_plus_base, 3
%linear_index2 = or disjoint i32 %linear_index_plus_base, 2
%linear_index1 = or disjoint i32 %linear_index_plus_base, 1
With this change, we ensure the SCEV analysis can correctly handle and
optimize disjoint OR instructions.
Signed-off-by: lihao217 <[email protected]>
Use the disjoint flag to convert or to add instead of calling the haveNoCommonBitsSet() ValueTracking query. This ensures that we can reliably undo add -> or canonicalization, even in cases where the necessary information has been lost or is too complex to reinfer in SCEV.
I have updated the bulk of the test coverage to add the necessary disjoint flags in advance.