Skip to content

Conversation

@nikic
Copy link
Contributor

@nikic nikic commented Dec 5, 2023

Use the disjoint flag to convert or to add instead of calling the haveNoCommonBitsSet() ValueTracking query. This ensures that we can reliably undo add -> or canonicalization, even in cases where the necessary information has been lost or is too complex to reinfer in SCEV.

I have updated the bulk of the test coverage to add the necessary disjoint flags in advance.

@nikic nikic requested review from preames and topperc December 5, 2023 13:28
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Dec 5, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2023

@llvm/pr-subscribers-llvm-analysis

Author: Nikita Popov (nikic)

Changes

Use the disjoint flag to convert or to add instead of calling the haveNoCommonBitsSet() ValueTracking query. This ensures that we can reliably undo add -> or canonicalization, even in cases where the necessary information has been lost or is too complex to reinfer in SCEV.

I have updated the bulk of the test coverage to add the necessary disjoint flags in advance.


Full diff: https://github.com/llvm/llvm-project/pull/74467.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+2-5)
  • (modified) llvm/test/Analysis/ScalarEvolution/add-like-or.ll (+38-4)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 9670e5309b32f..451ae73dbd601 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -5223,11 +5223,8 @@ static std::optional<BinaryOp> MatchBinaryOp(Value *V, const DataLayout &DL,
     return BinaryOp(Op);
 
   case Instruction::Or: {
-    // LLVM loves to convert `add` of operands with no common bits
-    // into an `or`. But SCEV really doesn't deal with `or` that well,
-    // so try extra hard to recognize this `or` as an `add`.
-    if (haveNoCommonBitsSet(Op->getOperand(0), Op->getOperand(1),
-                            SimplifyQuery(DL, &DT, &AC, CxtI)))
+    // Convert or disjoint into add nuw nsw.
+    if (cast<PossiblyDisjointInst>(Op)->isDisjoint())
       return BinaryOp(Instruction::Add, Op->getOperand(0), Op->getOperand(1),
                       /*IsNSW=*/true, /*IsNUW=*/true);
     return BinaryOp(Op);
diff --git a/llvm/test/Analysis/ScalarEvolution/add-like-or.ll b/llvm/test/Analysis/ScalarEvolution/add-like-or.ll
index 38b6c44fdf356..86bb9058832b4 100644
--- a/llvm/test/Analysis/ScalarEvolution/add-like-or.ll
+++ b/llvm/test/Analysis/ScalarEvolution/add-like-or.ll
@@ -6,15 +6,49 @@ define i8 @or-of-constant-with-no-common-bits-set(i8 %x, i8 %y) {
 ; CHECK-NEXT:  Classifying expressions for: @or-of-constant-with-no-common-bits-set
 ; CHECK-NEXT:    %t0 = shl i8 %x, 2
 ; CHECK-NEXT:    --> (4 * %x) U: [0,-3) S: [-128,125)
-; CHECK-NEXT:    %r = or i8 %t0, 3
+; CHECK-NEXT:    %r = or disjoint i8 %t0, 3
 ; CHECK-NEXT:    --> (3 + (4 * %x))<nuw><nsw> U: [3,0) S: [-125,-128)
 ; CHECK-NEXT:  Determining loop execution counts for: @or-of-constant-with-no-common-bits-set
 ;
   %t0 = shl i8 %x, 2
-  %r = or i8 %t0, 3
+  %r = or disjoint i8 %t0, 3
   ret i8 %r
 }
 
+define i8 @or-disjoint(i8 %x, i8 %y) {
+; CHECK-LABEL: 'or-disjoint'
+; CHECK-NEXT:  Classifying expressions for: @or-disjoint
+; CHECK-NEXT:    %or = or disjoint i8 %x, %y
+; CHECK-NEXT:    --> (%x + %y) U: full-set S: full-set
+; CHECK-NEXT:  Determining loop execution counts for: @or-disjoint
+;
+  %or = or disjoint i8 %x, %y
+  ret i8 %or
+}
+
+define i8 @or-no-disjoint(i8 %x, i8 %y) {
+; CHECK-LABEL: 'or-no-disjoint'
+; CHECK-NEXT:  Classifying expressions for: @or-no-disjoint
+; CHECK-NEXT:    %or = or i8 %x, %y
+; CHECK-NEXT:    --> %or U: full-set S: full-set
+; CHECK-NEXT:  Determining loop execution counts for: @or-no-disjoint
+;
+  %or = or i8 %x, %y
+  ret i8 %or
+}
+
+; FIXME: We could add nuw nsw flags here.
+define noundef i8 @or-disjoint-transfer-flags(i8 %x, i8 %y) {
+; CHECK-LABEL: 'or-disjoint-transfer-flags'
+; CHECK-NEXT:  Classifying expressions for: @or-disjoint-transfer-flags
+; CHECK-NEXT:    %or = or disjoint i8 %x, %y
+; CHECK-NEXT:    --> (%x + %y) U: full-set S: full-set
+; CHECK-NEXT:  Determining loop execution counts for: @or-disjoint-transfer-flags
+;
+  %or = or disjoint i8 %x, %y
+  ret i8 %or
+}
+
 define void @mask-high(i64 %arg, ptr dereferenceable(4) %arg1) {
 ; CHECK-LABEL: 'mask-high'
 ; CHECK-NEXT:  Classifying expressions for: @mask-high
@@ -24,7 +58,7 @@ define void @mask-high(i64 %arg, ptr dereferenceable(4) %arg1) {
 ; CHECK-NEXT:    --> (sext i32 %i to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
 ; CHECK-NEXT:    %i3 = and i64 %arg, -16
 ; CHECK-NEXT:    --> (16 * (%arg /u 16))<nuw> U: [0,-15) S: [-9223372036854775808,9223372036854775793)
-; CHECK-NEXT:    %i4 = or i64 1, %i3
+; CHECK-NEXT:    %i4 = or disjoint i64 1, %i3
 ; CHECK-NEXT:    --> (1 + (16 * (%arg /u 16))<nuw>)<nuw><nsw> U: [1,-14) S: [-9223372036854775807,9223372036854775794)
 ; CHECK-NEXT:    %i7 = phi i64 [ %i4, %bb ], [ %i8, %bb6 ]
 ; CHECK-NEXT:    --> {(1 + (16 * (%arg /u 16))<nuw>)<nuw><nsw>,+,1}<%bb6> U: full-set S: full-set Exits: ((sext i32 %i to i64) smax (1 + (16 * (%arg /u 16))<nuw>)<nuw><nsw>) LoopDispositions: { %bb6: Computable }
@@ -42,7 +76,7 @@ bb:
   %i = load i32, ptr %arg1, align 4
   %i2 = sext i32 %i to i64
   %i3 = and i64 %arg, -16
-  %i4 = or i64 1, %i3
+  %i4 = or disjoint i64 1, %i3
   %i5 = icmp sgt i64 %i4, %i2
   br i1 %i5, label %bb10, label %bb6
 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests to make sure we don't incorrectly add the nuw/nsw flags if poison doesn't imply UB, but found that we actually never transfer these. This is because the NSW/NUW flags set in MatchBinaryOp end up being ignored in favor of the ones on the original instruction. I'll fix that in a followup.

Use the disjoint flag to convert or to add instead of calling the
haveNoCommonBitsSet() ValueTracking query. This ensures that we
can reliably undo add -> or canonicalization, even in cases where
the necessary information has been lost or is too complex to
reinfer in SCEV.

I have updated the bulk of the test coverage to add the necessary
disjoint flags in advance.
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit ff0e4fb into llvm:main Dec 5, 2023
LiHao217 added a commit to LiHao217/llvm-project that referenced this pull request Jun 24, 2024
…n OR instructions

Prior to patch (llvm#74467), LLVM used the haveNoCommonBitsSet() function to determine whether OR instructions had no common bits. If they didn't, the OR instructions were transformed into ADD instructions, allowing the LoadStoreVectorizer (LSV) to vectorize them.

After the patch was applied, the implementation switched to using a disjoint flag to indicate this condition. However, these OR instructions were no longer recognized as disjoint and were not converted into ADD instructions. This oversight led to a regression where LSV failed to vectorize these instructions.

This commit addresses the regression by updating LSV to identify if the OR instructions are disjoint, and if so, to transform the IR as follows to continue vectorization:

Original IR:

    %linear_index3 = or i32 %linear_index_plus_base.fr, 3
    %linear_index2 = or i32 %linear_index_plus_base.fr, 2
    %linear_index1 = or i32 %linear_index_plus_base.fr, 1

Transformed IR:

    %linear_index3 = or disjoint i32 %linear_index_plus_base.fr, 3
    %linear_index2 = or disjoint i32 %linear_index_plus_base.fr, 2
    %linear_index1 = or disjoint i32 %linear_index_plus_base.fr, 1

This modification will enable LSV to recognize these instructions and restore vectorization opportunities.
LiHao217 referenced this pull request in LiHao217/llvm-project Jun 27, 2024
Summary:
   This commit resolves a regression in the LoadStoreVectorizer (LSV)
   due to a previous change in how OR instructions were handled. The
   regression prevented LSV from recognizing and vectorizing OR
   instructions marked as disjoint.

Background:
    A historical patch ([https://github.com/llvm/llvm-project/pull/74467](https://github.com/llvm/llvm-project/pull/74467))
    shifted the identification of disjoint OR instructions from using
    the haveNoCommonBitsSet() function to a disjoint flag within the
    instruction. However, this change inadvertently led to missed
    vectorization opportunities because LSV was no longer converting
    disjoint OR instructions to ADD instructions, as it did and not
    recognize the new flag.

    This patch updates the Scalar Evolution (SCEV) analysis to identify
    disjoint OR instructions based on the disjoint flag. By doing this,
    the instructions will be correctly interpreted and the LSV can restore
    lost vectorization opportunities.

Test plan:
    Added a new phase-ordering test that confirms the regression at -O1,
    -O2, and -O3 optimization levels, where the original OR instructions
    were not identified as disjoint and thus not vectorized. The test
    illustrates that with the new SCEV analysis, the transformed IR will
    properly have disjoint OR instructions vectorized by LSV.

Details:
    The modified test disjoint_or_vectorizer.ll showcases the restored
    vectorization with the following transformed IR:

Original IR:
    %linear_index3 = or i32 %linear_index_plus_base.fr, 3
    %linear_index2 = or i32 %linear_index_plus_base, 2
    %linear_index1 = or i32 %linear_index_plus_base, 1

Transformed IR with disjoint flag:
    %linear_index3 = or disjoint i32 %linear_index_plus_base, 3
    %linear_index2 = or disjoint i32 %linear_index_plus_base, 2
    %linear_index1 = or disjoint i32 %linear_index_plus_base, 1

With this change, we ensure the SCEV analysis can correctly handle and
optimize disjoint OR instructions.

Signed-off-by: lihao217 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants