Why does a SLURM cluster not allow nodesets in a different region than the controller? #4738
-
I'm trying to make a cluster with nodesets across more than one region. This fails, due to the check here. Why this constraint? I'm aware of the increased latency and cost involved with this setup, but I'm willing to accept these drawbacks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Currently, slurm-gcp does not support deploying a SLURM cluster controller with nodesets in multiple regions. |
Beta Was this translation helpful? Give feedback.
Currently, slurm-gcp does not support deploying a SLURM cluster controller with nodesets in multiple regions.
Google and SchedMD recommend using reservations for VM availability and compact placement policies to achieve low network latency within a region.
To ensure the lowest possible latency and reliable operation, node-sets should be in the same region as the controller (head node) (SLURM in the Clouds, GCP best practices).
If you need to run workloads across multiple regions, the recommended solution is to deploy separate SLURM clusters, each with its own controller and nodesets in a single region. You can then federate clusters if needed for workload portability.