Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
9adbd07
Init experience baseline (#145)
mollyheamazon Jul 24, 2025
52c34f2
Cluster management (#146)
nargokul Jul 24, 2025
ea94953
Cluster create cli (#150)
nargokul Jul 30, 2025
da98871
Create param (#153)
nargokul Jul 31, 2025
fa650d6
Add Describe and List cluster stack feature (#151)
papriwal Aug 5, 2025
8e172da
add validate logic in configure command, bug fixes for cluster init e…
mollyheamazon Aug 5, 2025
abdbc2e
Revert "add validate logic in configure command, bug fixes for cluste…
mollyheamazon Aug 5, 2025
0ac48fd
Revert "Revert "add validate logic in configure command, bug fixes fo…
mollyheamazon Aug 5, 2025
6c425b1
Add get cluster status method (#157)
papriwal Aug 5, 2025
12f9b67
Update for Hyperpod Cluster (#155)
nargokul Aug 6, 2025
84f3526
Update create cluster method to return full cluster detail object. (#…
papriwal Aug 6, 2025
3b03301
add inference template submit backend logic, fix namespace default ac…
mollyheamazon Aug 8, 2025
b751e87
Merge branch 'master' into launch-fast-follow (#174)
zhaoqizqwang Aug 11, 2025
e0baf5d
Bring recipe-supp branch to staging repo (#175)
zhaoqizqwang Aug 11, 2025
e84d9b8
Update to fetch templates from S3 and other changes (#176)
nargokul Aug 13, 2025
f01f5fe
Main change: Enable hyp-pytorch-job template in init experience. Mino…
mollyheamazon Aug 13, 2025
cbb9e7d
Update params being saved in jinja file (#171)
papriwal Aug 13, 2025
f747fbe
Revert "Bring recipe-supp branch to staging repo (#175)" (#181)
zhaoqizqwang Aug 13, 2025
c51986e
Fix merge conflict issues, update cluster template to add default in …
mollyheamazon Aug 14, 2025
c610b76
Fix: List cluster stacks failure for datetime objects (#189)
rsareddy0329 Aug 15, 2025
3f1eba4
Added mapping for HyperPodClusterName (#188)
aviruthen Aug 15, 2025
4f0e26a
Change default region in hyp submit command (#193)
zhaoqizqwang Aug 18, 2025
a90505c
Updated to handle YAML arrays in config file (#190)
nargokul Aug 18, 2025
0684d94
Fix CloudFormation tags parsing, array validation, and test mocking i…
nargokul Aug 19, 2025
4a4aa04
Test fix (#199)
nargokul Aug 20, 2025
6b4dddf
UPDATE CFN PARAM IN JINJA FILE (#198)
papriwal Aug 20, 2025
fe51b64
Updated comment for Resource Name Prefix to reflect the usage better …
nargokul Aug 20, 2025
717901a
Add default availability zone ID based on region (#194)
zhaoqizqwang Aug 20, 2025
8d70a5a
Replace hyp submit with hyp create by overriding the default for hyp …
mollyheamazon Aug 20, 2025
d0b88e0
Updated docs for cli sdk ref (#192)
papriwal Aug 20, 2025
8642901
Fix: List cluster stacks exclude ones with 'DELETE_COMPLETE' status (…
rsareddy0329 Aug 21, 2025
2f9b409
Update defaults to baseline example (#208)
nargokul Aug 21, 2025
274346b
Remove other jobs from template, change update-cluster verb to update…
mollyheamazon Aug 21, 2025
dda7925
filter help arguments depending on current template, fix minor integ …
mollyheamazon Aug 21, 2025
fa8cca4
Timeout for set_cluster_context (#211)
nargokul Aug 21, 2025
fdc2bdd
Fix: list-clusters to display all HP clusters including which have 0 …
rsareddy0329 Aug 21, 2025
d01d7d3
Update the enable_hp_inference_feature to be boolean . (#213)
nargokul Aug 21, 2025
a5eddab
Bug fixes to HypCLI Cluster Creation (#210)
aviruthen Aug 21, 2025
70307a6
Append UUID to resource name prefix to ensure uniqueness . (#216)
nargokul Aug 21, 2025
e155a05
Docs for cluster stack creation (#207)
papriwal Aug 21, 2025
933b13f
Rename Stack related commands to hyp-cluster-stack instead of hyp-clu…
nargokul Aug 21, 2025
477ad3d
Revert "Bug fixes to HypCLI Cluster Creation (#210)" (#217)
aviruthen Aug 21, 2025
7a87109
Task gov doc updates (#218)
papriwal Aug 21, 2025
7d2201d
update cloud formation template to 1.1, fix instance group setting fo…
mollyheamazon Aug 21, 2025
f34c546
Reorder and update description for each field in cluster creation (#221)
zhaoqizqwang Aug 22, 2025
572e960
fix: validation error for json format that accomadates both single an…
mollyheamazon Aug 22, 2025
383550c
Add --debug flag to docs (#225)
papriwal Aug 22, 2025
01448a9
Update the cluster stack command to be cluster-stack instead of hyp-…
nargokul Aug 22, 2025
402970a
Update CLI docs for validation and resource naming clarity (#226)
papriwal Aug 22, 2025
c292302
Update CHANGELOG.md for launch fast follow release (#228)
nargokul Aug 22, 2025
a1c1094
Add default availability zone (#229)
zhaoqizqwang Aug 22, 2025
395b88d
Enable Telemetry for Cluster creation (#230)
nargokul Aug 22, 2025
a852ea8
Implemented exec command with unit tests (#222)
aviruthen Aug 25, 2025
d4ab9ef
ABstract out some defaut values from the user . (#234)
nargokul Aug 25, 2025
e3c5e92
Cleanup and fix for notebooks (#236)
nargokul Aug 25, 2025
826d7e6
Add sphinx_click to requirements. (#231)
papriwal Aug 25, 2025
2601b15
Add integration tests for HP Cluster Creation (#227)
aviruthen Aug 25, 2025
0045816
Update setup.py (#237)
nargokul Aug 26, 2025
68b996d
Update pyproject.toml (#238)
nargokul Aug 26, 2025
68ee67e
Update CHANGELOG.md (#239)
nargokul Aug 26, 2025
58750ba
Test Fixes
nargokul Aug 26, 2025
17dfc07
Skip some invoke tests
nargokul Aug 26, 2025
88aa0ed
Skip some invoke tests
nargokul Aug 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ doc/_build/
/sagemaker-hyperpod/build
/sagemaker-hyperpod/.coverage
/sagemaker-hyperpod/.coverage.*
/hyperpod-cluster-stack-template/build

# Ignore all contents of result and results directories
/result/
Expand Down
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,23 @@
# Changelog

## v3.1.0 (2025-08-13)
## v3.2.0 (2025-08-25)

### Features

* Cluster management
* Creation of cluster stack
* Describing and listing a cluster stack
* Updating a cluster
* Init Experience
* Init, Validate, Create with local configurations


## v3.1.0 (2025-08-13)

### Features
* Task Governance feature for training jobs.


## v3.0.2 (2025-07-31)

### Features
Expand Down Expand Up @@ -34,3 +46,4 @@
### Features

- feature: Add support for SageMaker HyperPod CLI

7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,13 @@ hyp create hyp-pytorch-job \
--queue-name "training-queue" \
--priority "high" \
--max-retry 3 \
--accelerators 8 \
--vcpu 96.0 \
--memory 1152.0 \
--accelerators-limit 8 \
--vcpu-limit 96.0 \
--memory-limit 1152.0 \
--preferred-topology "topology.kubernetes.io/zone=us-west-2a" \
--volume name=model-data,type=hostPath,mount_path=/data,path=/data \
--volume name=training-output,type=pvc,mount_path=/data2,claim_name=my-pvc,read_only=false
```
Expand Down
123 changes: 123 additions & 0 deletions doc/_static/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,126 @@ html[data-theme="dark"] .navbar-brand .title {
html[data-theme="dark"] p {
color: #d1d5db !important;
}

.current.active>a {
background-color: aliceblue !important;
}

.bd-sidebar-primary li.has-children .caption,
.bd-sidebar-primary li.has-children>.reference {
margin-right: inherit;
}

nav.bd-links li>a {
margin-right: inherit;
}

.table tbody tr:hover {
background: none !important;
}

.wy-table-responsive table td,
.wy-table-responsive table th {
white-space: normal;
}

.wy-table-responsive {
margin-bottom: 24px;
max-width: 100%;
overflow: visible;
}

.pagination {
display: inline-block;
}

.pagination a {
color: black;
float: left;
padding: 8px 16px;
text-decoration: none;
}

.pagination a.active {
background-color: #2a80b9;
color: white;
}

.pagination a:hover:not(.active) {
background-color: #ddd;
}


dl.py.class.dt.sig.sig-object.py {
overflow: auto;
margin: 6px 0;
font-size: 90%;
line-height: normal;
background: #e7f2fa !important;
color: #2980b9 !important;
border-top: 3px solid #6ab0de !important;
padding: 6px;
position: relative;
}

.bd-article {
overflow: auto;
}

.sig-prename.descclassname {
color: #000;
}

.field-list {
display: grid !important;
grid-template-columns: 0.5fr 2fr !important;
}

.field-list dt {
background: transparent !important;
word-break: normal !important;
}

.py.class dl {
margin: 1rem 0 !important;
}

.page-toc.tocsection.onthispage svg {
margin-right: 0.5rem;
}

.sidebar-secondary-items {
display: block !important;
padding: 0.5rem 0 !important;
}

.table {
border-radius: 4px !important;
border: 1px solid #e1e5e9 !important;
border-collapse: separate !important;
border-spacing: 0 !important;
overflow: hidden !important;
}

.table tbody tr {
background: none !important;
}

.table tbody tr:hover {
background: none !important;
}

.table td,
.table th {
border: none !important;
border-bottom: 1px solid #e1e5e9 !important;
}

.table tr:last-child td {
border-bottom: none !important;
}

.bd-toc code {
background: transparent !important;
border: none;
}
38 changes: 38 additions & 0 deletions doc/cli/cli_index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
CLI Reference
=============

Complete reference for the SageMaker HyperPod Command Line Interface.

.. toctree::
:hidden:
:maxdepth: 2

cluster_management/cli_cluster_management
training/cli_training
inference/cli_inference

.. container::

.. grid:: 1 1 3 3
:gutter: 3

.. grid-item-card:: Cluster Management CLI
:link: cluster_management/cli_cluster_management
:link-type: doc
:class-card: sd-border-secondary

Cluster stack management commands, options and parameters.

.. grid-item-card:: Training CLI
:link: training/cli_training
:link-type: doc
:class-card: sd-border-secondary

Training CLI commands, options and parameters.

.. grid-item-card:: Inference CLI
:link: inference/cli_inference
:link-type: doc
:class-card: sd-border-secondary

Inference CLI commands, options and parameters.
9 changes: 9 additions & 0 deletions doc/cli_reference.md → doc/cli/cli_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

cli_training
cli_inference
cli_cluster_management
```

Complete reference for the SageMaker HyperPod Command Line Interface.
Expand All @@ -32,5 +33,13 @@ Training CLI commands, options and parameters.
Inference CLI commands, options and parameters.
:::

:::{grid-item-card} Cluster Management CLI
:link: cli_cluster_management
:link-type: ref
:class-card: sd-border-secondary

Cluster stack management commands, options and parameters.
:::

::::
::::
Loading
Loading