-
Couldn't load subscription status.
- Fork 303
Explicitely use only OpenBLAS for PyTorch if MKL is not used #2448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitely use only OpenBLAS for PyTorch if MKL is not used #2448
Conversation
|
On one of our systems, I had problems during the build with PyTorch-1.7.1 with fosscuda-2020a. The cause was that PyTorch in it's configure was picking up on our system MKL, which is ancient, and didn't provide the symbols expected by PyTorch. This issue was fixed by this PR, it now properly picks up on OpenBLAS, as it should for the fosscuda toolchain: With this (and the EC from https://github.com/easybuilders/easybuild-easyconfigs/pull/12991/files), my build completed succesfully. |
| elif get_software_root('OpenBLAS'): | ||
| # This is what PyTorch defaults to if no MKL is found. | ||
| # Make this explicit here to avoid it finding MKL from the system | ||
| options.append('BLAS=Eigen') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe stupid question, but why not just set BLAS=OpenBLAS here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because PyTorch uses the former cafe code and for some reasons seems to prefer using that with Eigen, which will use a BLAS as a backend. PyTorch itself uses BLAS directly (at least this is what I gathered from various issues)
To not change our current behavior and not deviate from the default PyTorch builds (for the no-mkl case) I chose to keep this as Eigen (same as PyTorch does)
And yes it is a mess: pytorch/pytorch#59218
|
@smoors Any update here? Would want this in the next EB release |
| elif LooseVersion(self.version) >= LooseVersion('1.9.0') and get_software_root('BLIS'): | ||
| options.append('BLAS=BLIS') | ||
| options.append('BLIS_HOME=' + get_software_root('BLIS')) | ||
| options.append('USE_MKLDNN_CBLAS=ON') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may not be strictly necessary but better to be explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. First the WITH_BLAS is only used when BLAS=Eigen and second WITH_BLAS is not even passed to CMake at all which is why the hack at https://github.com/easybuilders/easybuild-easyblocks/pull/2448/files#diff-348e306def79b774c5b63a5086cc82b723524fcb348042b12559d6b82f9e99b9R167 is required.
So no I wouldn't add this until PyTorch made up their mind how they really want things to be. See pytorch/pytorch#59218
|
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 4 out of 4 (4 easyconfigs in total) |
(created using
eb --new-pr)This avoids PyTorch from searching for e.g. MKL and using a system version instead of the intended OpenBLAS from our toolchain.
Also sets an env var which skips a very annoying, flaky test.