diff --git a/sdk/python/jobs/pipelines/stepsequence/echo_component.yml b/sdk/python/jobs/pipelines/stepsequence/echo_component.yml new file mode 100644 index 0000000000..0c52a33ff4 --- /dev/null +++ b/sdk/python/jobs/pipelines/stepsequence/echo_component.yml @@ -0,0 +1,14 @@ +$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json +name: echo_component +display_name: Echo Hello +version: 1 +type: command +description: A component that just echoes hello + +outputs: + output_data: + type: uri_folder + +command: echo "hello" > ${{outputs.output_data}}/message.txt + +environment: azureml://registries/azureml/environments/sklearn-1.5/labels/latest \ No newline at end of file diff --git a/sdk/python/jobs/pipelines/stepsequence/images/Scenario_Pipeline_Flow.png b/sdk/python/jobs/pipelines/stepsequence/images/Scenario_Pipeline_Flow.png new file mode 100644 index 0000000000..b48407a1aa Binary files /dev/null and b/sdk/python/jobs/pipelines/stepsequence/images/Scenario_Pipeline_Flow.png differ diff --git a/sdk/python/jobs/pipelines/stepsequence/pipeline_with_step_sequence_dummy_dependencies.ipynb b/sdk/python/jobs/pipelines/stepsequence/pipeline_with_step_sequence_dummy_dependencies.ipynb new file mode 100644 index 0000000000..4cf75f861a --- /dev/null +++ b/sdk/python/jobs/pipelines/stepsequence/pipeline_with_step_sequence_dummy_dependencies.ipynb @@ -0,0 +1,423 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3e959973-6fbc-481c-99ba-6ed8f057b195", + "metadata": {}, + "source": [ + "# Enabling Step Sequencing in AzureML SDK v2" + ] + }, + { + "cell_type": "markdown", + "id": "b7351744-08a7-4e5e-9727-7745871c904a", + "metadata": {}, + "source": [ + "AzureML SDK v2 does not include a direct equivalent of the `StepSequence` feature from SDK v1. This change means pipelines no longer support explicit ordering through a `StepSequence` object. Instead, SDK v2 relies heavily on data dependencies to define the execution order of steps as mentioned [here](https://learn.microsoft.com/en-us/azure/machine-learning/migrate-to-v2-execution-pipeline?view=azureml-api-2#mapping-of-key-functionality-in-sdk-v1-and-sdk-v2).\n", + "\n", + "**Recommended Workaround** - Dummy Data Dependencies" + ] + }, + { + "cell_type": "markdown", + "id": "126938da-385d-4c5e-a311-5f0ff1426d5e", + "metadata": {}, + "source": [ + "**SDK v1 ↔ SDK v2: Feature Mapping**\n", + "\n", + "The migration documentation highlights how core pipeline concepts translate between versions:\n", + "\n", + "| **SDK v1 Functionality** | **SDK v2 Equivalent** |\n", + "|---------------------------------|-------------------------------------------|\n", + "| `azureml.pipeline.core.Pipeline` | `azure.ai.ml.dsl.pipeline` |\n", + "| `OutputDatasetConfig` | `Output` |\n", + "| `Dataset .as_mount()` | `Input` |\n", + "| `StepSequence` | Data dependency via dummy inputs/outputs |\n", + "\n", + "This reinforces that, in SDK v2, data dependencies are the default mechanism to enforce step order." + ] + }, + { + "cell_type": "markdown", + "id": "8e669bb2-4f4d-4063-a2c9-a55b0cb02277", + "metadata": {}, + "source": [ + "### Scenario: Pipeline Step Execution Flow\n", + "\n", + "The diagram below represents the scenario we are creating with AzureML pipelines. \n", + "\n", + ">- **Step 1** runs first. \n", + ">- It then branches into **Step 2, Step 3, and Step 4** (which run in parallel). \n", + ">- The outputs of **Step 2, Step 3, and Step 4** converge into **Step 5**. \n", + ">- Finally, **Step 6** executes after Step 5 completes.\n", + "\n", + "\n", + "![Scenario Pipeline Flow](images/Scenario_Pipeline_Flow.png)" + ] + }, + { + "cell_type": "markdown", + "id": "7b8b1cbf-8961-4545-9033-a22f936d323c", + "metadata": {}, + "source": [ + "# 1. Connect to Azure Machine Learning Workspace" + ] + }, + { + "cell_type": "markdown", + "id": "9e5f8df8-090d-433a-9aa4-e022b8e80988", + "metadata": {}, + "source": [ + "The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run." + ] + }, + { + "cell_type": "markdown", + "id": "113b552e-1a68-4062-91b5-e7b1716f7d2f", + "metadata": {}, + "source": [ + "## 1.1 Import the required libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5395a1e7-6d3b-4915-b5a0-d4408ac1adfe", + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.ml.entities import CommandComponent\n", + "from azure.ai.ml import dsl\n", + "from azure.ai.ml.entities import AmlCompute\n", + "from azure.ai.ml import load_component\n", + "from azure.ai.ml import MLClient\n", + "from azure.ai.ml.dsl import pipeline" + ] + }, + { + "cell_type": "markdown", + "id": "14b61903-cfbf-4319-bf2f-4d945b8735c0", + "metadata": {}, + "source": [ + "## 1.2 Configure credential" + ] + }, + { + "cell_type": "markdown", + "id": "0027b717-3822-4ad7-8fac-9be670d0dfd0", + "metadata": {}, + "source": [ + "We are using `DefaultAzureCredential` to get access to workspace. `DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios.\n", + "\n", + "Reference for more available credentials if it does not work for you: [configure credential example](https://github.com/Azure/azureml-examples/blob/902929725e8d713447c99f80e4530c83075ecd9b/sdk/python/jobs/configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b78708e3-7c6d-4bd2-9b23-18b289242ffd", + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " credential = DefaultAzureCredential()\n", + " # Check if given credential can get token successfully.\n", + " credential.get_token(\"https://management.azure.com/.default\")\n", + "except Exception as ex:\n", + " # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work\n", + " credential = InteractiveBrowserCredential()" + ] + }, + { + "cell_type": "markdown", + "id": "fd240f09-4a83-462b-b0ee-b520d30d0d52", + "metadata": {}, + "source": [ + "## 1.3 Get a handle to the workspace" + ] + }, + { + "cell_type": "markdown", + "id": "73d2a54f-296d-40d2-8e87-f2103acf40c4", + "metadata": {}, + "source": [ + "We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](https://github.com/Azure/azureml-examples/blob/902929725e8d713447c99f80e4530c83075ecd9b/sdk/python/jobs/configuration.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5cff28bc-1bbf-4631-ba40-946ac2313408", + "metadata": {}, + "outputs": [], + "source": [ + "# Get a handle to workspace\n", + "ml_client = MLClient.from_config(credential=credential)\n", + "\n", + "# Retrieve an already attached Azure Machine Learning Compute.\n", + "cluster_name = \"cpu-cluster\"\n", + "print(ml_client.compute.get(cluster_name))" + ] + }, + { + "cell_type": "markdown", + "id": "dd611068-e6a9-43e3-90d4-34f45c2b3352", + "metadata": {}, + "source": [ + "# 2. Define and create components into workspace" + ] + }, + { + "cell_type": "markdown", + "id": "84b8e0d0-1c21-4430-9fb8-982cf6739dda", + "metadata": {}, + "source": [ + "## 2.1 Load components from YAML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ab169f6-ba10-444f-b840-7acf16bf0422", + "metadata": {}, + "outputs": [], + "source": [ + "parent_dir = \".\"\n", + "\n", + "def test_function():\n", + " return load_component(source=parent_dir + \"/echo_component.yml\")" + ] + }, + { + "cell_type": "markdown", + "id": "9b874901-f4ca-4c06-90ed-3aad2dff9a44", + "metadata": {}, + "source": [ + "## 2.2 Inspect loaded component" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7123ee3d-281e-4d89-aa89-ca7034c4739e", + "metadata": {}, + "outputs": [], + "source": [ + "# Print the component as yaml\n", + "print(test_function())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b40dc7b-3d47-4871-9892-acfeb96bc4a9", + "metadata": {}, + "outputs": [], + "source": [ + "# Inspect more information\n", + "print(type(test_function()))\n", + "help(test_function()._func)" + ] + }, + { + "cell_type": "markdown", + "id": "3137e9e4-3efc-4ee8-acc3-86028d150a7b", + "metadata": {}, + "source": [ + "## 2.3 Define a component inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce450afe-a0f5-4661-b206-b1497191118d", + "metadata": {}, + "outputs": [], + "source": [ + "# defining a component inline in Python using the SDK\n", + "component_dummy = CommandComponent(\n", + " name=\"dummy_component\",\n", + " display_name=\"Dummy Component\",\n", + " description=\"A dummy component for pipeline steps\",\n", + " command=\"echo hello\",\n", + " environment=\"azureml://registries/azureml/environments/sklearn-1.5/labels/latest\",\n", + " outputs={\"output_data\": {\"type\": \"uri_folder\"}}\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "99d6d698-0c5e-4f42-be71-3f6c6f70e8d5", + "metadata": {}, + "outputs": [], + "source": [ + "# Step 1\n", + "@pipeline(name=\"step1_pipeline\", description=\"Step 1\")\n", + "def pipeline_step1():\n", + " step1 = test_function()()\n", + " step1_out = component_dummy()\n", + " return {\"step1_output\": step1_out.outputs.output_data}\n", + "\n", + "# Step 2\n", + "@pipeline(name=\"step2_pipeline\", description=\"Step 2\")\n", + "def pipeline_step2(step_input:str):\n", + " step2 = test_function()()\n", + " dummy_step2 = component_dummy()\n", + " return {\"step2_output\": dummy_step2.outputs.output_data}\n", + "\n", + "# Step 3\n", + "@pipeline(name=\"step3_pipeline\", description=\"Step 3\")\n", + "def pipeline_step3(step_input:str):\n", + " step3 = test_function()()\n", + " dummy_step3 = component_dummy()\n", + " return {\"step3_output\": dummy_step3.outputs.output_data}\n", + "\n", + "# Step 4\n", + "@pipeline(name=\"step4_pipeline\", description=\"Step 4\")\n", + "def pipeline_step4(step_input:str):\n", + " step4 = test_function()()\n", + " dummy_step4 = component_dummy()\n", + " return {\"step4_output\": dummy_step4.outputs.output_data}\n", + "\n", + "# Step 5 (converge)\n", + "@pipeline(name=\"step5_pipeline\", description=\"Step 5\")\n", + "def pipeline_step5(step2_in:str, step3_in:str, step4_in:str):\n", + " step5 = test_function()()\n", + " dummy_step5 = component_dummy()\n", + " return {\"step5_output\": dummy_step5.outputs.output_data}\n", + "\n", + "# Step 6 (final)\n", + "@pipeline(name=\"step6_pipeline\", description=\"Step 6\")\n", + "def pipeline_step6(step5_in:str):\n", + " step6 = test_function()()\n", + " dummy_step6 = component_dummy()\n", + " return {\"final_output\": dummy_step6.outputs.output_data}" + ] + }, + { + "cell_type": "markdown", + "id": "53b0af07-9c3d-409f-a491-9e5f66849eb4", + "metadata": {}, + "source": [ + "# 3. Sample pipeline job" + ] + }, + { + "cell_type": "markdown", + "id": "7918ac4d-ebe1-4d0a-80da-402060e3fc7f", + "metadata": {}, + "source": [ + "## 3.1 Build pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a91ed96-909e-4859-9780-cdf7d3b02bb0", + "metadata": {}, + "outputs": [], + "source": [ + "# define a pipeline\n", + "@pipeline(name=\"step_sequence_pipeline\", description=\"Step sequence pipeline with branching\")\n", + "def pipeline_with_step_sequence():\n", + " s1 = pipeline_step1()\n", + " s2 = pipeline_step2(s1.outputs.step1_output)\n", + " s3 = pipeline_step3(s1.outputs.step1_output)\n", + " s4 = pipeline_step4(s1.outputs.step1_output)\n", + " \n", + " s5 = pipeline_step5(s2.outputs.step2_output, s3.outputs.step3_output, s4.outputs.step4_output)\n", + " s6 = pipeline_step6(s5.outputs.step5_output)\n", + " \n", + " return {\"pipeline_output\": s6.outputs.final_output}\n", + "\n", + "\n", + "\n", + "pipeline_job = pipeline_with_step_sequence()\n", + "\n", + "# set pipeline level compute\n", + "pipeline_job.settings.default_compute = \"cpu-cluster\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "204cbd25-7e9c-4032-9e41-4dad583439df", + "metadata": {}, + "outputs": [], + "source": [ + "print(pipeline_job)" + ] + }, + { + "cell_type": "markdown", + "id": "af39698f-0259-4786-ae98-34d67f90b447", + "metadata": {}, + "source": [ + "## 3.2 Submit pipeline job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b48711cd-003c-4157-bc10-c6d1470f6d01", + "metadata": {}, + "outputs": [], + "source": [ + "# submit job to workspace\n", + "pipeline_job = ml_client.jobs.create_or_update(\n", + " pipeline_job, experiment_name=\"pipeline_samples_branching\"\n", + ")\n", + "pipeline_job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bb7e616e-5074-43a5-be04-b22bfbcfa0e6", + "metadata": {}, + "outputs": [], + "source": [ + "# Wait until the job completes\n", + "ml_client.jobs.stream(pipeline_job.name)" + ] + }, + { + "cell_type": "markdown", + "id": "e8296316-862a-4ea7-abb9-13cd545bb15a", + "metadata": {}, + "source": [ + "# Next Steps" + ] + }, + { + "cell_type": "markdown", + "id": "fe39ba20-d988-495e-bbb8-d081ae2fed1d", + "metadata": {}, + "source": [ + "You can see further examples of running a pipeline job [here](https://github.com/Azure/azureml-examples/blob/902929725e8d713447c99f80e4530c83075ecd9b/sdk/python/jobs/pipelines/)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.10 - SDK v2", + "language": "python", + "name": "python310-sdkv2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.18" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}