-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improvements to System.Math InternalCall code. #4847
Conversation
This should be reviewable. Everything is passing and looking good, so I don't have any other changes I am planning on making (unless otherwise requested). |
@tannergooding Could you please split this into several smaller PRs? It will be hard to make progress on it as is because of it has too much stuff in it.
Thanks! |
@jkotas, this becomes significantly easier to review if you look at each of the commits individually, rather than all at once. It also becomes simpler to review if you add ?w=1 to the end of the url, as it causes whitespace specific changes to be left out of the review. |
// instead returns +1.0. | ||
|
||
if(IS_DBL_INFINITY(y) && IS_DBL_NEGATIVEONE(x)) { | ||
*((INT64 *)(&result)) = CLR_NAN_64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, not too familiar with native code but isn't violating strict aliasing supposed to be UB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, violating strict aliasing is UB according to the spec, but clang
, gcc
, and msvc
will handle this code as intended.
@tannergooding any plans to implement perf tests for these math functions? Also how about more rigorous correctness tests? There are a few tests over in CoreFx but they're minimal and don't check boundary cases. There are pal tests for the underlying native methods but nothing that verifies these when called from managed code. |
@AndyAyersMS. I'll see about getting some perf tests implemented as well as some more rigorous correctness tests. |
I wrote a simple perf test for BTW math function performance seems significantly worse on Linux than on Windows. |
I have added some very basic performance tests for all of the All performance tests are implemented as follows:
All of the functions which were not modified did not show any improvement or regression. The execution time below is the Hardware: Desktop w/ 3.7GHz Quad-Core A10-7850K (AMD) and 16GB RAM Of the functions which were changed, they are showing the following improvments (x64):
Of the functions which were changed, they are showing the following improvments (x86):
FYI. @AndyAyersMS |
|
||
var diff = Math.Abs(absDoubleExpectedResult - result); | ||
|
||
if ((diff != 0) && (diff < absDoubleEpsilon)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be if (diff > absDoubleEpsilon)
-- no need to check against zero, and fail if the diff is too large.
Similarly for checks in the other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing.
Thanks for adding tests.... left a few comments. I'd be interested to see the Windows vs Linux numbers. |
I've updated according to the feedback provided and am working on getting some Linux (and Mac) numbers for comparison. |
Thanks for the updates. One last thing to think about. The code under jit/performance has to work well under 3 different use cases:
I think what you have is good on the perf test aspects, but it's likely running this as a regular test will take a long time. What I've done for the other perf tests is (for now) give up on the manual perf test aspect and scale back iterations so when run as an exe the timing is reasonable, say a second or so at most in release and a short as possible in non-release. To get back the manual mode capabilities, I have plans to go back and add optional So, look at how long this test takes in CI and if it stands out, you should consider doing something to trim it back. |
Looks like you hit the timeout ...
|
I'm updating this to accept two arguments (similar to the SIMD perf tests). One for the math function to test and the other which will be the number of iterations to run. I'm going to default the iterations to 1 for debug and 1000 for release (I can run 2500 iterations of the |
Updated. The Math Functions performance test now accepts two arguments. One allows you to specify the name of the test to run ( |
Nicely done -- LGTM. |
@@ -28,7 +28,7 @@ | |||
</PropertyGroup> | |||
<!--Leaf Project Items--> | |||
<ItemGroup> | |||
<CppCompile Include="FloatNative.cpp" /> | |||
<CppCompile Include="FloatDouble.cpp" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The casing on the file name differs here from that used on the actual file. Should be fine for Windows, but is this not a problem for Linux type systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe (in the case of _._proj) casing differences are being handled already. However, I have adjusted the casing to match what exists on disk.
I have some perf numbers for Linux and Mac. I'm not sure I trust the Linux numbers because of how small they are (I did double check the I deployed the right bits), but Mac did show some small (and consistent) improvements (I was happy to see already good perf coming from Mac). I also updated my post above to list the hardware I ran the windows benchmarks on. OS X showed very little improvement, but the improvement was consistent. Hardware: MacBook Pro w/ 2.5GHz Quad-Core i7 (Intel) and 16GB RAM
Linux showed even less improvement and it did not seem to be consistent. Hardware: Asus ZenBook w/ 2.6GHz Quad-Core i3 (Intel) and 4GB RAM
FYI. @AndyAyersMS Let me know what else, if anything, is required to get this merged. |
LGTM. Something for future follow-up: try and run Linux on same HW as Windows, and compare relative perf (eg dual-boot a machine or similar). Data above and some experiments I've done suggest Linux math function performance may not be very good. Also windows 10 math libs seem to light up on AVX2 capable machines. So for Windows, you might want to measure on both older and newer HW. |
@AndyAyersMS, I did some investigations on Linux and it seems that the slowness is coming from the libm implementation being linked in ( I tried swapping the implementation out with |
Interesting... so do you think we should be linking amdlibm.a instead? Seems like we should open an issue for this. |
I believe, at the very least, it is worth investigating further. AMD's I'm still wondering if there is some additional detail I'm missing, because it is a little hard to believe that I have all the code available on my box (and I believe my office is just down the hall from yours) if you would like to take a closer look tomorrow. |
Rebased code onto HEAD and finished cleaning up the code. |
Updating the existing PAL tests. Should provide much better coverage for the for things both inside and outside of a functions input domain. |
This is now passing (again), has new perf tests, and extended pal test coverage. Is there anything else required on my end to get this merged in? |
Looking at the trigger phrase, it seems my previous comment caused a coverage test to run, which failed because it couldn't find ./corerun or ./runtests.sh. |
@tannergooding I would suggest getting somebody on the runtime side to sign off. Just browsing through the new pal tests, I don't understand why the expected variance for inputs is sometimes PAL_EPS and sometimes a multiple or fraction of it. Did you come about these variances by trial and error? I would actually hope that math library functions we link to would end up fairly close to 1 ULP for most reasonable inputs. I understood why we relaxed the variance on the managed side since we were adding up many results, but here you're just testing isolated values. |
@AndyAyersMS, the epsilon used for these tests was already set to I can certainly update the tests to be more have a more accurate base epsilon ( As for why it varies by a multiple/division of 10, that is because floating point values have digits of precision ( So, in the first example 0.1234567890 and 0.1234567891 should be considered as equally precise, because the 10th digit is outside the range of precision for a single-precision floating-point integer. The same is true for example 123456789.0 and 123456789.1 in the second example. So |
@AndyAyersMS. I have added a new commit which updates the value of PAL_EPSILON to be more precise. Additionally, I have commented all instances of PAL_EPSILON to explain (roughly) what I explained above. It seems that the various implementations (between Windows, Linux, and Mac) are accurate to within 2^-50 (2^-21 for float). Where the actual machine-epsilon is 2^-52 (2^-23 for float). Do you know who on the Runtime side would be good to tag for review? |
@sergiy-k Can you suggest somebody to review the PAL changes? |
Ping @sergiy-k Can you suggest somebody to review the PAL changes? |
/cc @janvorli |
@tannergooding You have accidentally (I guess) changed the license headers in multiple files from .NET Foundation to Microsoft. Can you please fix that? |
@janvorli, I've rebased the changes against the current HEAD and fixed the license headers. |
LGTM |
Everything is passing and I believe I have received all necessary sign-off. Am I clear to merge? |
Thanks! |
Math.Exp() to match dotnet/coreclr#4847.
Math.Exp() to match dotnet/coreclr#4847.
* Adding some basic System.Math performance tests. * Renaming 'floatnative' to 'floatdouble'. * Removing outdated workarounds in the floatdouble interop code. * Renaming 'finite.cpp' to 'math.cpp' * Updating the double-precision math tests. * Updating PAL_EPSILON to be more precise. Commit migrated from dotnet/coreclr@08786f2
This cleans up the native code for the
System.Math
internal calls. There were several instances of workarounds implemented for much older versions of the MSVC compiler that are just not relevant anymore.The list of workarounds removed are as follows:
Sin
/Cos
/Tan
were implement invm\amd64\JitHelpers_Fast.asm
(using x87 floating-point instructions) because the CRT versions were previously too slow.Log
,Log10
,Exp
,Pow
, andRound
were being compiled with/fp:precise
because the/fp:fast
implementation was previously slower.Exp(+/-INFINITY)
was previously special handled as the CRT implementation did not return the expected values.Pow
had some additional code (and inline assembly) that was unnecessary.Round
was using inline assembly to call thefrndint
x87 floating-point instructionIn all of these cases, the replaced code was one or more of the following:
fsin
,fcos
, andftan
).