GPU-enabled enroot containers fail after updating to latest nvidia-container-toolkit #4159
tpdownes
announced in
Announcements
Replies: 1 comment
-
The mitigation to this is included in v1.51.1. I'm going to convert this to a discussion. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Release 1.17.7 of the nvidia-container-toolkit contains a regression that impacts Slurm clusters that update to this version. See the following reports:
We advise customers not to upgrade running systems to this package. All compute nodes with GPUs can do this by creating a file at
/etc/apt/preferences.d/block-broken-nvidia-container
:We will integrate this workaround robustly into our solutions while NVIDIA work on a fix to their packaging and solution.
Additionally, we recommend disabling
unattended-upgrades.service
if it is not already disabled:Beta Was this translation helpful? Give feedback.
All reactions