Commit c16f09c
committed
[AArch64][SVE] Reduce MaxInterleaveFactor for A510 and A520
The default MaxInterleaveFactor for AArch64 targets is 2.
This produces inefficient codegen on at least two in-order cores,
those being Cortex-A510 and Cortex-A520. For example a simple vector
add
```
void foo(float a, float b, float dst, unsigned n) {
for (unsigned i = 0; i < n; ++i)
dst[i] = a[i] + b[i];
}
```
Vectorizes the inner loop into the following interleaved sequence
of instructions
```
add x12, x1, x10
ld1b { z0.b }, p0/z, [x1, x10]
add x13, x2, x10
ld1b { z1.b }, p0/z, [x2, x10]
ldr z2, [x12, #1, mul vl]
ldr z3, [x13, #1, mul vl]
dech x11
add x12, x0, x10
fadd z0.s, z1.s, z0.s
fadd z1.s, z3.s, z2.s
st1b { z0.b }, p0, [x0, x10]
addvl x10, x10, #2
str z1, [x12, #1, mul vl]
```
while when we reduce MaxInterleaveFactor to 1 we get the following
```
.LBB0_13: // %vector.body
// =>This Inner Loop Header: Depth=1
ld1w { z0.s }, p0/z, [x1, x10, lsl #2]
ld1w { z1.s }, p0/z, [x2, x10, lsl #2]
fadd z0.s, z1.s, z0.s
st1w { z0.s }, p0, [x0, x10, lsl #2]
incw x10
```
This patch also introduces IR tests to showcase this.
Change-Id: Ie1e862f6a1db851182a95534b3b987feb670d7ca1 parent 64555e3 commit c16f09c
File tree
2 files changed
+361
-0
lines changed- llvm
- lib/Target/AArch64
- test/Transforms/LoopVectorize/AArch64
2 files changed
+361
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
| 184 | + | |
184 | 185 | | |
185 | 186 | | |
186 | 187 | | |
| |||
0 commit comments