|
| 1 | +(cli_cluster_management)= |
| 2 | + |
| 3 | +# Cluster Management |
| 4 | + |
| 5 | +Complete reference for SageMaker HyperPod cluster management parameters and configuration options. |
| 6 | + |
| 7 | +```{note} |
| 8 | +**Region Configuration**: For commands that accept the `--region` option, if no region is explicitly provided, the command will use the default region from your AWS credentials configuration. |
| 9 | +``` |
| 10 | + |
| 11 | +* [Initialize Configuration](#hyp-init) |
| 12 | +* [Create Cluster Stack](#hyp-create) |
| 13 | +* [Update Cluster](#hyp-update-hyp-cluster) |
| 14 | +* [List Cluster Stacks](#hyp-list-hyp-cluster) |
| 15 | +* [Describe Cluster Stack](#hyp-describe-hyp-cluster) |
| 16 | +* [List HyperPod Clusters](#hyp-list-cluster) |
| 17 | +* [Set Cluster Context](#hyp-set-cluster-context) |
| 18 | +* [Get Cluster Context](#hyp-get-cluster-context) |
| 19 | +* [Get Monitoring](#hyp-get-monitoring) |
| 20 | + |
| 21 | +* [Configure Parameters](#hyp-configure) |
| 22 | +* [Validate Configuration](#hyp-validate) |
| 23 | +* [Reset Configuration](#hyp-reset) |
| 24 | + |
| 25 | +## hyp init |
| 26 | + |
| 27 | +Initialize a template scaffold in the current directory. |
| 28 | + |
| 29 | +#### Syntax |
| 30 | + |
| 31 | +```bash |
| 32 | +hyp init TEMPLATE [DIRECTORY] [OPTIONS] |
| 33 | +``` |
| 34 | + |
| 35 | +#### Parameters |
| 36 | + |
| 37 | +| Parameter | Type | Required | Description | |
| 38 | +|-----------|------|----------|-------------| |
| 39 | +| `TEMPLATE` | CHOICE | Yes | Template type (hyp-cluster, hyp-pytorch-job, hyp-custom-endpoint, hyp-jumpstart-endpoint) | |
| 40 | +| `DIRECTORY` | PATH | No | Target directory (default: current directory) | |
| 41 | +| `--version` | TEXT | No | Schema version to use | |
| 42 | + |
| 43 | +## hyp create |
| 44 | + |
| 45 | +Create a new HyperPod cluster stack using the provided configuration. |
| 46 | + |
| 47 | +#### Syntax |
| 48 | + |
| 49 | +```bash |
| 50 | +hyp create [OPTIONS] |
| 51 | +``` |
| 52 | + |
| 53 | +#### Parameters |
| 54 | + |
| 55 | +| Parameter | Type | Required | Description | |
| 56 | +|-----------|------|----------|-------------| |
| 57 | +| `--region` | TEXT | No | AWS region where the cluster stack will be created | |
| 58 | +| `--debug` | FLAG | No | Enable debug logging | |
| 59 | + |
| 60 | +## hyp update hyp-cluster |
| 61 | + |
| 62 | +Update an existing HyperPod cluster configuration. |
| 63 | + |
| 64 | +#### Syntax |
| 65 | + |
| 66 | +```bash |
| 67 | +hyp update hyp-cluster [OPTIONS] |
| 68 | +``` |
| 69 | + |
| 70 | +#### Parameters |
| 71 | + |
| 72 | +| Parameter | Type | Required | Description | |
| 73 | +|-----------|------|----------|-------------| |
| 74 | +| `--cluster-name` | TEXT | Yes | Name of the cluster to update | |
| 75 | +| `--instance-groups` | TEXT | No | JSON string of instance group configurations | |
| 76 | +| `--instance-groups-to-delete` | TEXT | No | JSON string of instance groups to delete | |
| 77 | +| `--region` | TEXT | No | AWS region of the cluster | |
| 78 | +| `--node-recovery` | TEXT | No | Node recovery setting (Automatic or None) | |
| 79 | +| `--debug` | FLAG | No | Enable debug logging | |
| 80 | + |
| 81 | +## hyp list hyp-cluster |
| 82 | + |
| 83 | +List all HyperPod cluster stacks (CloudFormation stacks). |
| 84 | + |
| 85 | +#### Syntax |
| 86 | + |
| 87 | +```bash |
| 88 | +hyp list hyp-cluster [OPTIONS] |
| 89 | +``` |
| 90 | + |
| 91 | +#### Parameters |
| 92 | + |
| 93 | +| Parameter | Type | Required | Description | |
| 94 | +|-----------|------|----------|-------------| |
| 95 | +| `--region` | TEXT | No | AWS region to list stacks from | |
| 96 | +| `--status` | TEXT | No | Filter by stack status. Format: "['CREATE_COMPLETE', 'UPDATE_COMPLETE']" | |
| 97 | +| `--debug` | FLAG | No | Enable debug logging | |
| 98 | + |
| 99 | +## hyp describe hyp-cluster |
| 100 | + |
| 101 | +Describe a specific HyperPod cluster stack. |
| 102 | + |
| 103 | +#### Syntax |
| 104 | + |
| 105 | +```bash |
| 106 | +hyp describe hyp-cluster STACK-NAME [OPTIONS] |
| 107 | +``` |
| 108 | + |
| 109 | +#### Parameters |
| 110 | + |
| 111 | +| Parameter | Type | Required | Description | |
| 112 | +|-----------|------|----------|-------------| |
| 113 | +| `STACK-NAME` | TEXT | Yes | Name of the CloudFormation stack to describe | |
| 114 | +| `--region` | TEXT | No | AWS region of the stack | |
| 115 | +| `--debug` | FLAG | No | Enable debug logging | |
| 116 | + |
| 117 | +## hyp list-cluster |
| 118 | + |
| 119 | +List SageMaker HyperPod clusters with capacity information. |
| 120 | + |
| 121 | +#### Syntax |
| 122 | + |
| 123 | +```bash |
| 124 | +hyp list-cluster [OPTIONS] |
| 125 | +``` |
| 126 | + |
| 127 | +#### Parameters |
| 128 | + |
| 129 | +| Parameter | Type | Required | Description | |
| 130 | +|-----------|------|----------|-------------| |
| 131 | +| `--region` | TEXT | No | AWS region to list clusters from | |
| 132 | +| `--output` | TEXT | No | Output format ("table" or "json", default: "json") | |
| 133 | +| `--clusters` | TEXT | No | Comma-separated list of specific cluster names | |
| 134 | +| `--namespace` | TEXT | No | Namespace to check capacity for (can be used multiple times) | |
| 135 | +| `--debug` | FLAG | No | Enable debug logging | |
| 136 | + |
| 137 | +## hyp set-cluster-context |
| 138 | + |
| 139 | +Connect to a HyperPod EKS cluster and set kubectl context. |
| 140 | + |
| 141 | +#### Syntax |
| 142 | + |
| 143 | +```bash |
| 144 | +hyp set-cluster-context [OPTIONS] |
| 145 | +``` |
| 146 | + |
| 147 | +#### Parameters |
| 148 | + |
| 149 | +| Parameter | Type | Required | Description | |
| 150 | +|-----------|------|----------|-------------| |
| 151 | +| `--cluster-name` | TEXT | Yes | Name of the HyperPod cluster to connect to | |
| 152 | +| `--region` | TEXT | No | AWS region of the cluster | |
| 153 | +| `--namespace` | TEXT | No | Kubernetes namespace to connect to | |
| 154 | +| `--debug` | FLAG | No | Enable debug logging | |
| 155 | + |
| 156 | +## hyp get-cluster-context |
| 157 | + |
| 158 | +Get context information for the currently connected cluster. |
| 159 | + |
| 160 | +#### Syntax |
| 161 | + |
| 162 | +```bash |
| 163 | +hyp get-cluster-context [OPTIONS] |
| 164 | +``` |
| 165 | + |
| 166 | +#### Parameters |
| 167 | + |
| 168 | +| Parameter | Type | Required | Description | |
| 169 | +|-----------|------|----------|-------------| |
| 170 | +| `--debug` | FLAG | No | Enable debug logging | |
| 171 | + |
| 172 | +## hyp get-monitoring |
| 173 | + |
| 174 | +Get monitoring configurations for the HyperPod cluster. |
| 175 | + |
| 176 | +#### Syntax |
| 177 | + |
| 178 | +```bash |
| 179 | +hyp get-monitoring [OPTIONS] |
| 180 | +``` |
| 181 | + |
| 182 | +#### Parameters |
| 183 | + |
| 184 | +| Parameter | Type | Required | Description | |
| 185 | +|-----------|------|----------|-------------| |
| 186 | +| `--grafana` | FLAG | No | Return Grafana dashboard URL | |
| 187 | +| `--prometheus` | FLAG | No | Return Prometheus workspace URL | |
| 188 | +| `--list` | FLAG | No | Return list of available metrics | |
| 189 | + |
| 190 | +## hyp configure |
| 191 | + |
| 192 | +Configure cluster parameters interactively or via command line. |
| 193 | + |
| 194 | +#### Syntax |
| 195 | + |
| 196 | +```bash |
| 197 | +hyp configure [OPTIONS] |
| 198 | +``` |
| 199 | + |
| 200 | +#### Parameters |
| 201 | + |
| 202 | +This command dynamically supports all configuration parameters available in the current template's schema. Common parameters include: |
| 203 | + |
| 204 | +| Parameter | Type | Required | Description | |
| 205 | +|-----------|------|----------|-------------| |
| 206 | +| `--resource-name-prefix` | TEXT | No | Prefix for all AWS resources | |
| 207 | +| `--stage` | TEXT | No | Deployment stage ("gamma" or "prod") | |
| 208 | +| `--vpc-cidr` | TEXT | No | VPC CIDR block | |
| 209 | +| `--kubernetes-version` | TEXT | No | Kubernetes version for EKS cluster | |
| 210 | +| `--node-recovery` | TEXT | No | Node recovery setting ("Automatic" or "None") | |
| 211 | +| `--env` | JSON | No | Environment variables as JSON object | |
| 212 | +| `--args` | JSON | No | Command arguments as JSON array | |
| 213 | +| `--command` | JSON | No | Command to run as JSON array | |
| 214 | +| `--tags` | JSON | No | Resource tags as JSON object | |
| 215 | + |
| 216 | +**Note:** The exact parameters available depend on your current template type and version. Run `hyp configure --help` to see all available options for your specific configuration. |
| 217 | + |
| 218 | +## hyp validate |
| 219 | + |
| 220 | +Validate the current cluster configuration. |
| 221 | + |
| 222 | +#### Syntax |
| 223 | + |
| 224 | +```bash |
| 225 | +hyp validate |
| 226 | +``` |
| 227 | + |
| 228 | +#### Parameters |
| 229 | + |
| 230 | +No parameters required. This command validates the `config.yaml` file in the current directory against the appropriate schema. |
| 231 | + |
| 232 | +## hyp reset |
| 233 | + |
| 234 | +Reset the current directory's config.yaml to default values. |
| 235 | + |
| 236 | +#### Syntax |
| 237 | + |
| 238 | +```bash |
| 239 | +hyp reset |
| 240 | +``` |
| 241 | + |
| 242 | +#### Parameters |
| 243 | + |
| 244 | +No parameters required. |
| 245 | + |
| 246 | + |
| 247 | + |
| 248 | +## Parameter Reference |
| 249 | + |
| 250 | +### Common Parameters Across Commands |
| 251 | + |
| 252 | +| Parameter | Type | Description | Default | |
| 253 | +|-----------|------|-------------|---------| |
| 254 | +| `--region` | TEXT | AWS region | Current AWS profile region | |
| 255 | +| `--help` | FLAG | Show command help | - | |
| 256 | +| `--verbose` | FLAG | Enable verbose output | false | |
| 257 | + |
| 258 | +### Configuration File Parameters |
| 259 | + |
| 260 | +The `config.yaml` file supports the following parameters: |
| 261 | + |
| 262 | +| Parameter | Type | Description | Default | |
| 263 | +|-----------|------|-------------|---------| |
| 264 | +| `template` | TEXT | Template name | "hyp-cluster" | |
| 265 | +| `namespace` | TEXT | Kubernetes namespace | "kube-system" | |
| 266 | +| `stage` | TEXT | Deployment stage | "gamma" | |
| 267 | +| `resource_name_prefix` | TEXT | Resource name prefix | "sagemaker-hyperpod-eks" | |
| 268 | +| `vpc_cidr` | TEXT | VPC CIDR block | "10.192.0.0/16" | |
| 269 | +| `kubernetes_version` | TEXT | Kubernetes version | "1.31" | |
| 270 | +| `node_recovery` | TEXT | Node recovery setting | "Automatic" | |
| 271 | +| `create_vpc_stack` | BOOLEAN | Create new VPC | true | |
| 272 | +| `create_eks_cluster_stack` | BOOLEAN | Create new EKS cluster | true | |
| 273 | +| `create_hyperpod_cluster_stack` | BOOLEAN | Create HyperPod cluster | true | |
| 274 | + |
| 275 | +**Note:** The actual available configuration parameters depend on the specific template schema version. Use `hyp init hyp-cluster` to see all available parameters for your version. |
| 276 | + |
| 277 | +## Examples |
| 278 | + |
| 279 | +### Basic Cluster Stack Creation |
| 280 | + |
| 281 | +```bash |
| 282 | +# Start with a clean directory |
| 283 | +mkdir my-hyperpod-cluster |
| 284 | +cd my-hyperpod-cluster |
| 285 | + |
| 286 | +# Initialize cluster configuration |
| 287 | +hyp init hyp-cluster |
| 288 | + |
| 289 | +# Configure basic parameters |
| 290 | +hyp configure --resource-name-prefix my-cluster --stage prod |
| 291 | + |
| 292 | +# Validate configuration |
| 293 | +hyp validate |
| 294 | + |
| 295 | +# Create cluster stack |
| 296 | +hyp create --region us-west-2 |
| 297 | +``` |
| 298 | + |
| 299 | +### Update Existing Cluster |
| 300 | + |
| 301 | +```bash |
| 302 | +# Update instance groups |
| 303 | +hyp update hyp-cluster \ |
| 304 | + --cluster-name my-cluster \ |
| 305 | + --instance-groups '[{"InstanceCount":2,"InstanceGroupName":"worker-nodes","InstanceType":"ml.m5.large"}]' \ |
| 306 | + --region us-west-2 |
| 307 | +``` |
| 308 | + |
| 309 | +### List and Describe |
| 310 | + |
| 311 | +```bash |
| 312 | +# List all cluster stacks |
| 313 | +hyp list hyp-cluster --region us-west-2 |
| 314 | + |
| 315 | +# Describe specific cluster stack |
| 316 | +hyp describe hyp-cluster my-stack-name --region us-west-2 |
| 317 | + |
| 318 | +# List HyperPod clusters with capacity info |
| 319 | +hyp list-cluster --region us-west-2 --output table |
| 320 | + |
| 321 | +# Connect to cluster |
| 322 | +hyp set-cluster-context --cluster-name my-cluster --region us-west-2 |
| 323 | + |
| 324 | +# Get current context |
| 325 | +hyp get-cluster-context |
| 326 | +``` |
0 commit comments