Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented May 31, 2021

(created using eb --new-pr)

This avoids PyTorch from searching for e.g. MKL and using a system version instead of the intended OpenBLAS from our toolchain.

Also sets an env var which skips a very annoying, flaky test.

@boegel boegel added this to the release after 4.4.0 milestone May 31, 2021
@boegel boegel added the change label May 31, 2021
@casparvl
Copy link
Contributor

casparvl commented Jun 1, 2021

On one of our systems, I had problems during the build with PyTorch-1.7.1 with fosscuda-2020a. The cause was that PyTorch in it's configure was picking up on our system MKL, which is ancient, and didn't provide the symbols expected by PyTorch.

This issue was fixed by this PR, it now properly picks up on OpenBLAS, as it should for the fosscuda toolchain:

CMake Warning at cmake/Dependencies.cmake:172 (message):
  Preferred BLAS (Eigen) cannot be found, now searching for a general BLAS
  library
Call Stack (most recent call first):
  CMakeLists.txt:513 (include)


-- Checking for [openblas]
--   Library openblas: /sw/arch/RedHatEnterpriseServer7/EB_production/2020/software/OpenBLAS/0.3.9-GCC-9.3.0/lib/libopenblas.so
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Performing Test BLAS_F2C_DOUBLE_WORKS
-- Performing Test BLAS_F2C_DOUBLE_WORKS - Failed
-- Performing Test BLAS_F2C_FLOAT_WORKS
-- Performing Test BLAS_F2C_FLOAT_WORKS - Success
-- Performing Test BLAS_USE_CBLAS_DOT
-- Performing Test BLAS_USE_CBLAS_DOT - Success
-- Found a library with BLAS API (open).
...

With this (and the EC from https://github.com/easybuilders/easybuild-easyconfigs/pull/12991/files), my build completed succesfully.

@boegel boegel added bug fix and removed change labels Jun 1, 2021
@boegel boegel changed the title Explictely use only OpenBLAS for PyTorch if MKL is not used Explicitely use only OpenBLAS for PyTorch if MKL is not used Jun 9, 2021
elif get_software_root('OpenBLAS'):
# This is what PyTorch defaults to if no MKL is found.
# Make this explicit here to avoid it finding MKL from the system
options.append('BLAS=Eigen')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe stupid question, but why not just set BLAS=OpenBLAS here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because PyTorch uses the former cafe code and for some reasons seems to prefer using that with Eigen, which will use a BLAS as a backend. PyTorch itself uses BLAS directly (at least this is what I gathered from various issues)
To not change our current behavior and not deviate from the default PyTorch builds (for the no-mkl case) I chose to keep this as Eigen (same as PyTorch does)

And yes it is a mess: pytorch/pytorch#59218

@Flamefire
Copy link
Contributor Author

@smoors Any update here? Would want this in the next EB release

elif LooseVersion(self.version) >= LooseVersion('1.9.0') and get_software_root('BLIS'):
options.append('BLAS=BLIS')
options.append('BLIS_HOME=' + get_software_root('BLIS'))
options.append('USE_MKLDNN_CBLAS=ON')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be added here?

options.append('WITH_BLAS=blis')

see ashishfarmer/pytorch@1dac97d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may not be strictly necessary but better to be explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. First the WITH_BLAS is only used when BLAS=Eigen and second WITH_BLAS is not even passed to CMake at all which is why the hack at https://github.com/easybuilders/easybuild-easyblocks/pull/2448/files#diff-348e306def79b774c5b63a5086cc82b723524fcb348042b12559d6b82f9e99b9R167 is required.

So no I wouldn't add this until PyTorch made up their mind how they really want things to be. See pytorch/pytorch#59218

@boegel
Copy link
Member

boegel commented Sep 1, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-1.1.0-foss-2019a-Python-3.7.2.eb
  • SUCCESS PyTorch-1.3.1-fosscuda-2020b.eb
  • SUCCESS PyTorch-1.9.0-foss-2020b.eb
  • SUCCESS PyTorch-1.7.1-foss-2020b.eb

Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3302.joltik.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), Python 3.6.8
See https://gist.github.com/d9deb33ce083a79ef85d8b5d4cde174a for a full test report.

@boegel boegel merged commit 2cf9fb9 into easybuilders:develop Sep 2, 2021
@Flamefire Flamefire deleted the 20210531135834_new_pr_vDSwHBmwhi branch September 2, 2021 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants