Commit 4bb4b96
committed
Auto merge of #74562 - pickfire:is_ascii_branchless, r=nagisa
Remove branch in optimized is_ascii
Performs slightly better in short or medium bytes by eliminating
the last branch check on `byte_pos == len` and always check the
last byte as it is always at most one `usize`.
Benchmark, before `libcore`, after `libcore_new`. It improves
medium and short by 1ns but regresses unaligned_tail by 2ns,
either way we can get unaligned_tail have a tiny chance of 1/8
on a 64 bit machine. I don't think we should bet on that, the
probability is worse than dice.
```
test long::case00_libcore ... bench: 38 ns/iter (+/- 1) = 183947 MB/s
test long::case00_libcore_new ... bench: 38 ns/iter (+/- 1) = 183947 MB/s
test long::case01_iter_all ... bench: 227 ns/iter (+/- 6) = 30792 MB/s
test long::case02_align_to ... bench: 40 ns/iter (+/- 1) = 174750 MB/s
test long::case03_align_to_unrolled ... bench: 19 ns/iter (+/- 1) = 367894 MB/s
test medium::case00_libcore ... bench: 5 ns/iter (+/- 0) = 6400 MB/s
test medium::case00_libcore_new ... bench: 4 ns/iter (+/- 0) = 8000 MB/s
test medium::case01_iter_all ... bench: 20 ns/iter (+/- 1) = 1600 MB/s
test medium::case02_align_to ... bench: 6 ns/iter (+/- 0) = 5333 MB/s
test medium::case03_align_to_unrolled ... bench: 5 ns/iter (+/- 0) = 6400 MB/s
test short::case00_libcore ... bench: 7 ns/iter (+/- 0) = 1000 MB/s
test short::case00_libcore_new ... bench: 6 ns/iter (+/- 0) = 1166 MB/s
test short::case01_iter_all ... bench: 5 ns/iter (+/- 0) = 1400 MB/s
test short::case02_align_to ... bench: 5 ns/iter (+/- 0) = 1400 MB/s
test short::case03_align_to_unrolled ... bench: 5 ns/iter (+/- 1) = 1400 MB/s
test unaligned_both::case00_libcore ... bench: 4 ns/iter (+/- 0) = 7500 MB/s
test unaligned_both::case00_libcore_new ... bench: 4 ns/iter (+/- 0) = 7500 MB/s
test unaligned_both::case01_iter_all ... bench: 26 ns/iter (+/- 0) = 1153 MB/s
test unaligned_both::case02_align_to ... bench: 13 ns/iter (+/- 2) = 2307 MB/s
test unaligned_both::case03_align_to_unrolled ... bench: 11 ns/iter (+/- 0) = 2727 MB/s
test unaligned_head::case00_libcore ... bench: 5 ns/iter (+/- 0) = 6200 MB/s
test unaligned_head::case00_libcore_new ... bench: 5 ns/iter (+/- 0) = 6200 MB/s
test unaligned_head::case01_iter_all ... bench: 19 ns/iter (+/- 1) = 1631 MB/s
test unaligned_head::case02_align_to ... bench: 10 ns/iter (+/- 0) = 3100 MB/s
test unaligned_head::case03_align_to_unrolled ... bench: 14 ns/iter (+/- 0) = 2214 MB/s
test unaligned_tail::case00_libcore ... bench: 3 ns/iter (+/- 0) = 10333 MB/s
test unaligned_tail::case00_libcore_new ... bench: 5 ns/iter (+/- 0) = 6200 MB/s
test unaligned_tail::case01_iter_all ... bench: 19 ns/iter (+/- 0) = 1631 MB/s
test unaligned_tail::case02_align_to ... bench: 10 ns/iter (+/- 0) = 3100 MB/s
test unaligned_tail::case03_align_to_unrolled ... bench: 13 ns/iter (+/- 0) = 2384 MB/s
```
Rough (unfair) maths on improvements for fun: 1ns * 7/8 - 2ns * 1/8 = 0.625ns
Inspired by fish and zsh clever trick to highlight missing linefeeds (⏎)
and branchless implementation of binary_search in rust.
cc @thomcc #74066
r? @nagisa1 file changed
+6
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3029 | 3029 | | |
3030 | 3030 | | |
3031 | 3031 | | |
3032 | | - | |
| 3032 | + | |
3033 | 3033 | | |
3034 | 3034 | | |
3035 | 3035 | | |
| |||
3077 | 3077 | | |
3078 | 3078 | | |
3079 | 3079 | | |
3080 | | - | |
| 3080 | + | |
| 3081 | + | |
| 3082 | + | |
| 3083 | + | |
3081 | 3084 | | |
3082 | 3085 | | |
3083 | 3086 | | |
| |||
3098 | 3101 | | |
3099 | 3102 | | |
3100 | 3103 | | |
3101 | | - | |
3102 | | - | |
3103 | | - | |
3104 | | - | |
3105 | | - | |
3106 | | - | |
3107 | 3104 | | |
3108 | 3105 | | |
3109 | | - | |
| 3106 | + | |
3110 | 3107 | | |
3111 | 3108 | | |
3112 | 3109 | | |
| |||
0 commit comments