You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `your_container`: A Deep Learning container. To find the most recent release of the SMP container, see Release notes for the SageMaker model parallelism library.
259
+
260
+
- (Optional) You can provide the HuggingFace token if you need pre-trained weights from HuggingFace by setting the following key-value pair:
We recommend that you utilize [HyperPod command-line tool (release_v2)](https://github.com/aws/sagemaker-hyperpod-cli/tree/release_v2)
If the `STATUS` is `PENDING` or `ContainerCreating`, run the following command to get more details.
302
+
```
303
+
kubectl describe pod <name-of-pod>
304
+
```
305
+
306
+
After the job `STATUS` changes to `Running`, you can examine the log by using the following command.
307
+
```
308
+
kubectl logs name_of_pod
309
+
```
310
+
311
+
The `STATUS` will turn to `Completed` when you run `kubectl get pods`.
312
+
313
+
For more information about the k8s cluster configuration, see [Running a training job on HyperPod k8s](https://docs.aws.amazon.com/sagemaker/latest/dg/cluster-specific-configurations-run-training-job-hyperpod-k8s.html).
314
+
262
315
To run Amazon Nova recipe on SageMaker HyperPod clusters orchestrated by Amazon EKS, you will need to create a Restricted Instance Group in your cluster. Refer to the following documentation to [learn more](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-hp-cluster.html).
0 commit comments