Skip to content

Commit 8b6047d

Browse files
Aditi2424adishaamaheshxbjiayelamazon
authored
Documentation (#145)
* Update telemetry status to be Integer for parity (#130) Co-authored-by: adishaa <[email protected]> * Release new version for Health Monitoring Agent (1.0.643.0_1.0.192.0) with minor improvements and bug fixes (#137) * Release new version for Health Monitoring Agent (1.0.674.0_1.0.199.0) with minor improvements and bug fixes. (#139) * documentation working setup * training inference documentation changes --------- Co-authored-by: adishaa <[email protected]> Co-authored-by: maheshxb <[email protected]> Co-authored-by: jiayelamazon <[email protected]>
1 parent 699e7b8 commit 8b6047d

File tree

10 files changed

+664
-8
lines changed

10 files changed

+664
-8
lines changed

doc/_static/image.png

2.51 KB
Loading

doc/_static/image_dark.png

36.9 KB
Loading

doc/_static/image_light.svg

Lines changed: 1 addition & 0 deletions
Loading

doc/conf.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,10 @@ def get_version():
8181
"sphinx.ext.todo",
8282
"sphinx.ext.viewcode",
8383
"nbsphinx",
84-
# Use either myst_parser or myst_nb, not both
85-
# "myst_parser",
8684
"myst_nb",
8785
"sphinx_design",
86+
"sphinx_tabs.tabs",
87+
"sphinx_copybutton"
8888
]
8989

9090
# Mock modules that might not be available during documentation build
@@ -106,6 +106,10 @@ def get_version():
106106

107107
html_theme = "sphinx_book_theme"
108108
html_theme_options = {
109+
"logo": {
110+
"image_light": "_static/image.png",
111+
"image_dark": "_static/image.png",
112+
},
109113
"repository_url": "https://github.com/aws/sagemaker-hyperpod-cli",
110114
"use_repository_button": True,
111115
"use_issues_button": True,

doc/getting_started.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
(getting_started)=
2+
3+
# Getting Started
4+
5+
This guide will help you get started with the SageMaker HyperPod CLI and SDK to perform basic operations.
6+
7+
## Cluster Management
8+
9+
### List Available Clusters
10+
11+
List all available SageMaker HyperPod clusters in your account:
12+
13+
**CLI**
14+
```bash
15+
hyp list-cluster [--region <region>] [--namespace <namespace>] [--output <json|table>]
16+
```
17+
18+
**SDK**
19+
```python
20+
from sagemaker.hyperpod.hyperpod_manager import HyperPodManager
21+
22+
clusters = HyperPodManager.list_clusters(region='us-east-2')
23+
print(clusters)
24+
```
25+
26+
**Parameters:**
27+
- `region` (string) - Optional. The AWS region where the SageMaker HyperPod and EKS clusters are located. If not specified, uses the region from your current AWS account credentials.
28+
- `namespace` (string) - Optional. The namespace to check quota with. Only SageMaker managed namespaces are supported.
29+
- `output` (enum) - Optional. The output format: `table` or `json` (default).
30+
31+
### Connect to a Cluster
32+
33+
Configure your local kubectl environment to interact with a specific SageMaker HyperPod cluster and namespace:
34+
35+
**CLI**
36+
```bash
37+
hyp set-cluster-context --cluster-name <cluster-name> [--namespace <namespace>]
38+
```
39+
40+
**SDK**
41+
```python
42+
from sagemaker.hyperpod.hyperpod_manager import HyperPodManager
43+
44+
HyperPodManager.set_context('<hyperpod-cluster-name>', region='us-east-2')
45+
```
46+
47+
**Parameters:**
48+
- `cluster-name` (string) - Required. The SageMaker HyperPod cluster name to configure with.
49+
- `namespace` (string) - Optional. The namespace to connect to. If not specified, the CLI will automatically discover accessible namespaces.
50+
51+
### Get Current Cluster Context
52+
53+
View information about the currently configured cluster context:
54+
55+
**CLI**
56+
```bash
57+
hyp get-cluster-context
58+
```
59+
60+
**SDK**
61+
```python
62+
from sagemaker.hyperpod.hyperpod_manager import HyperPodManager
63+
64+
# Get current context information
65+
context = HyperPodManager.get_context()
66+
print(context)
67+
```
68+
69+
## Job Management
70+
71+
### List Pods for a Training Job
72+
73+
View all pods associated with a specific training job:
74+
75+
**CLI**
76+
```bash
77+
hyp list-pods hyp-pytorch-job --job-name <job-name>
78+
```
79+
80+
**SDK**
81+
```python
82+
# List all pods created for this job
83+
pytorch_job.list_pods()
84+
```
85+
86+
**Parameters:**
87+
- `job-name` (string) - Required. The name of the job to list pods for.
88+
89+
### Access Pod Logs
90+
91+
View logs for a specific pod within a training job:
92+
93+
**CLI**
94+
```bash
95+
hyp get-logs hyp-pytorch-job --pod-name <pod-name> --job-name <job-name>
96+
```
97+
98+
**SDK**
99+
```python
100+
# Check the logs from pod0
101+
pytorch_job.get_logs_from_pod("demo-pod-0")
102+
```
103+
104+
**Parameters:**
105+
- `job-name` (string) - Required. The name of the job to get logs for.
106+
- `pod-name` (string) - Required. The name of the pod to get logs from.
107+
108+
## Next Steps
109+
110+
After setting up your environment and connecting to a cluster, you can:
111+
112+
- Create and manage PyTorch training jobs
113+
- Deploy and manage inference endpoints
114+
- Monitor cluster resources and job performance
115+
116+
For more detailed information on specific commands, use the `--help` flag:
117+
118+
```bash
119+
hyp <command> --help
120+
```

doc/index.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22

33
# SageMaker HyperPod CLI and SDK Documentation
44

5-
**Version**: {{ version }}
6-
75
```{toctree}
86
:hidden:
97
:maxdepth: 1
108
9+
Installation <installation>
1110
Getting Started <getting_started>
11+
Training <training>
12+
Inference <inference>
1213
API reference <_apidoc/modules>
1314
```
1415

@@ -19,7 +20,7 @@ SageMaker HyperPod CLI and SDK provide a seamless way to manage distributed trai
1920
:gutter: 3
2021

2122
:::{grid-item-card} Installation
22-
:link: getting_started
23+
:link: installation
2324
:link-type: ref
2425

2526
Get the CLI/ SDK setup
@@ -33,14 +34,14 @@ Beginner's guide to using CLI/ SDK
3334
:::
3435

3536
:::{grid-item-card} Training
36-
:link: getting_started
37+
:link: training
3738
:link-type: ref
3839

3940
Detailed guide on creating Pytorch training jobs
4041
:::
4142

4243
:::{grid-item-card} Inference
43-
:link: getting_started
44+
:link: inference
4445
:link-type: ref
4546

4647
Detailed guide on creating, invoking and monitoring endpoints

doc/inference.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
(inference)=
2+
3+
# Inference with SageMaker HyperPod
4+
5+
SageMaker HyperPod provides powerful capabilities for deploying and managing inference endpoints on EKS-hosted clusters. This guide covers how to create, invoke, and manage inference endpoints using both the HyperPod CLI and SDK.
6+
7+
## Overview
8+
9+
SageMaker HyperPod inference endpoints allow you to:
10+
11+
- Deploy pre-trained JumpStart models
12+
- Deploy custom models with your own inference code
13+
- Configure resource requirements for inference
14+
- Manage endpoint lifecycle
15+
- Invoke endpoints for real-time predictions
16+
- Monitor endpoint performance
17+
18+
## Creating Inference Endpoints
19+
20+
You can create inference endpoints using either JumpStart models or custom models:
21+
22+
### JumpStart Model Endpoints
23+
24+
**CLI**
25+
```bash
26+
hyp create hyp-jumpstart-endpoint \
27+
--version 1.0 \
28+
--model-id jumpstart-model-id \
29+
--instance-type ml.g5.8xlarge \
30+
--endpoint-name endpoint-jumpstart \
31+
--tls-output-s3-uri s3://sample-bucket
32+
```
33+
34+
**SDK**
35+
```python
36+
from sagemaker.hyperpod.inference import HyperPodJumpstartEndpoint
37+
38+
# Create a JumpStart endpoint
39+
endpoint = HyperPodJumpstartEndpoint(
40+
endpoint_name="endpoint-jumpstart",
41+
model_id="jumpstart-model-id",
42+
instance_type="ml.g5.8xlarge",
43+
tls_output_s3_uri="s3://sample-bucket"
44+
)
45+
46+
# Deploy the endpoint
47+
endpoint.create()
48+
```
49+
50+
### Custom Model Endpoints
51+
52+
**CLI**
53+
```bash
54+
hyp create hyp-custom-endpoint \
55+
--version 1.0 \
56+
--endpoint-name endpoint-custom \
57+
--model-uri s3://my-bucket/model-artifacts \
58+
--image 123456789012.dkr.ecr.us-west-2.amazonaws.com/my-inference-image:latest \
59+
--instance-type ml.g5.8xlarge \
60+
--tls-output-s3-uri s3://sample-bucket
61+
```
62+
63+
**SDK**
64+
```python
65+
from sagemaker.hyperpod.inference import HyperPodCustomEndpoint
66+
67+
# Create a custom endpoint
68+
endpoint = HyperPodCustomEndpoint(
69+
endpoint_name="endpoint-custom",
70+
model_uri="s3://my-bucket/model-artifacts",
71+
image="123456789012.dkr.ecr.us-west-2.amazonaws.com/my-inference-image:latest",
72+
instance_type="ml.g5.8xlarge",
73+
tls_output_s3_uri="s3://sample-bucket"
74+
)
75+
76+
# Deploy the endpoint
77+
endpoint.create()
78+
```
79+
80+
## Key Parameters
81+
82+
When creating an inference endpoint, you'll need to specify:
83+
84+
- **endpoint-name**: Unique identifier for your endpoint
85+
- **model-id** (JumpStart): ID of the pre-trained JumpStart model
86+
- **model-uri** (Custom): S3 location of your model artifacts
87+
- **image** (Custom): Docker image containing your inference code
88+
- **instance-type**: The EC2 instance type to use
89+
- **tls-output-s3-uri**: S3 location to store TLS certificates
90+
91+
## Managing Inference Endpoints
92+
93+
### List Endpoints
94+
95+
**CLI**
96+
```bash
97+
# List JumpStart endpoints
98+
hyp list hyp-jumpstart-endpoint
99+
100+
# List custom endpoints
101+
hyp list hyp-custom-endpoint
102+
```
103+
104+
**SDK**
105+
```python
106+
from sagemaker.hyperpod.inference import HyperPodJumpstartEndpoint, HyperPodCustomEndpoint
107+
108+
# List JumpStart endpoints
109+
jumpstart_endpoints = HyperPodJumpstartEndpoint.list()
110+
print(jumpstart_endpoints)
111+
112+
# List custom endpoints
113+
custom_endpoints = HyperPodCustomEndpoint.list()
114+
print(custom_endpoints)
115+
```
116+
117+
### Describe an Endpoint
118+
119+
**CLI**
120+
```bash
121+
# Describe JumpStart endpoint
122+
hyp describe hyp-jumpstart-endpoint --endpoint-name <endpoint-name>
123+
124+
# Describe custom endpoint
125+
hyp describe hyp-custom-endpoint --endpoint-name <endpoint-name>
126+
```
127+
128+
**SDK**
129+
```python
130+
from sagemaker.hyperpod.inference import HyperPodJumpstartEndpoint, HyperPodCustomEndpoint
131+
132+
# Get JumpStart endpoint details
133+
jumpstart_endpoint = HyperPodJumpstartEndpoint.load(endpoint_name="endpoint-jumpstart")
134+
jumpstart_details = jumpstart_endpoint.describe()
135+
print(jumpstart_details)
136+
137+
# Get custom endpoint details
138+
custom_endpoint = HyperPodCustomEndpoint.load(endpoint_name="endpoint-custom")
139+
custom_details = custom_endpoint.describe()
140+
print(custom_details)
141+
```
142+
143+
### Invoke an Endpoint
144+
145+
**CLI**
146+
```bash
147+
# Invoke custom endpoint
148+
hyp invoke hyp-custom-endpoint \
149+
--endpoint-name <endpoint-name> \
150+
--content-type "application/json" \
151+
--payload '{"inputs": "What is machine learning?"}'
152+
```
153+
154+
**SDK**
155+
```python
156+
from sagemaker.hyperpod.inference import HyperPodCustomEndpoint
157+
158+
# Load the endpoint
159+
endpoint = HyperPodCustomEndpoint.load(endpoint_name="endpoint-custom")
160+
161+
# Invoke the endpoint
162+
response = endpoint.invoke(
163+
payload={"inputs": "What is machine learning?"},
164+
content_type="application/json"
165+
)
166+
print(response)
167+
```
168+
169+
### Delete an Endpoint
170+
171+
**CLI**
172+
```bash
173+
# Delete JumpStart endpoint
174+
hyp delete hyp-jumpstart-endpoint --endpoint-name <endpoint-name>
175+
176+
# Delete custom endpoint
177+
hyp delete hyp-custom-endpoint --endpoint-name <endpoint-name>
178+
```
179+
180+
**SDK**
181+
```python
182+
from sagemaker.hyperpod.inference import HyperPodJumpstartEndpoint, HyperPodCustomEndpoint
183+
184+
# Delete JumpStart endpoint
185+
jumpstart_endpoint = HyperPodJumpstartEndpoint.load(endpoint_name="endpoint-jumpstart")
186+
jumpstart_endpoint.delete()
187+
188+
# Delete custom endpoint
189+
custom_endpoint = HyperPodCustomEndpoint.load(endpoint_name="endpoint-custom")
190+
custom_endpoint.delete()
191+
```
192+
193+
## Inference Example Notebooks
194+
195+
For detailed examples of inference with HyperPod, see:
196+
- [CLI Inference FSX Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-fsx-model-e2e-cli.ipynb)
197+
- [CLI Inference Jumpstart Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-jumpstart-e2e-cli.ipynb)
198+
- [CLI Inference S3 Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-s3-model-e2e-cli.ipynb)

0 commit comments

Comments
 (0)