Skip to main content

Cheat Sheet

Basic setup#

Connect to workspace#

from azureml.core import Workspace
ws = Workspace.from_config()

The workspace object is the fundamental handle on your Azure ML assets and is used throughout (often simply referred to by ws).

For more details: Workspaces

Connect to compute target#

compute_target = ws.compute_targets['<compute-target-name>']

Sample usage.

compute_target = ws.compute_targets['powerful-gpu']
config = ScriptRunConfig(
compute_target=compute_target, # compute target used to run train.py script
source_directory='.',
script='train.py',
)

For more details: Compute Target

Prepare Python environment#

You can use a pip requirements.txt file or a Conda env.yml file to define a Python environment on your compute.

from azureml.core import Environment
# Option 1. From pip
environment = Environment.from_pip_requirements('<env-name>', '<path/to/requirements.txt>')
# Option 2. From Conda
environment = Environment.from_conda_specification('<env-name>', '<path/to/env.yml>')

You can also use docker images to prepare your environments.

Sample usage.

environment = Environment.from_pip_requirements('<env-name>', '<path/to/requirements.txt>')
config = ScriptRunConfig(
environment=environment, # set the python environment
source_directory='.',
script='train.py',
)

For more details: Environment

Submit code#

To run code in Azure ML you need to:

  1. Configure: Configuration includes specifying the code to run, the compute target to run on and the Python environment to run in.
  2. Submit: Create or reuse an Azure ML Experiment and submit the run.

ScriptRunConfig#

A typical directory may have the following structure:

source_directory/
script.py # entry point to your code
module1.py # modules called by script.py
...

To run $ (env) python <path/to/code>/script.py [arguments] on a remote compute cluster target: ComputeTarget with an environment env: Environment we can use the ScriptRunConfig class.

from azureml.core import ScriptRunConfig
config = ScriptRunConfig(
source_directory='<path/to/code>', # relative paths okay
script='script.py',
compute_target=compute_target,
environment=environment,
arguments=arguments,
)

For more details on arguments: Command line arguments

info
  • compute_target: If not provided the script will run on your local machine.
  • environment: If not provided, uses a default Python environment managed by Azure ML. See Environment for more details.

Commands#

It is possible to provide the explicit command to run.

command = 'echo cool && python script.py'.split()
config = ScriptRunConfig(
source_directory='<path/to/code>', # relative paths okay
command=command,
compute_target=compute_target,
environment=environment,
arguments=arguments,
)

For more details: Commands

Experiment#

To submit this code, create an Experiment: a light-weight container that helps to organize our submissions and keep track of code (See Run History).

exp = Experiment(ws, '<experiment-name>')
run = exp.submit(config)
print(run.get_portal_url())

This link will take you to the Azure ML Studio where you can monitor your run.

For more details: ScriptRunConfig

Sample usage#

Here is a fairly typical example using a Conda environment to run a training script train.py on our local machine from the command line.

$ conda env create -f env.yml # create environment called pytorch
$ conda activate pytorch
(pytorch) $ cd <path/to/code>
(pytorch) $ python train.py --learning_rate 0.001 --momentum 0.9

Suppose you want to run this on a GPU in Azure.

ws = Workspace.from_config()
compute_target = ws.compute_targets['powerful-gpu']
environment = Environment.from_conda_specification('pytorch', 'env.yml')
config = ScriptRunConfig(
source_directory='<path/to/code>',
script='train.py',
environment=environment,
arguments=['--learning_rate', 0.001, '--momentum', 0.9],
)
run = Experiment(ws, 'PyTorch model training').submit(config)

Distributed GPU Training#

Adapt your ScriptRunConfig to enable distributed GPU training.

from azureml.core import Workspace, Experiment, ScriptRunConfig
from azureml.core import Environment
from azureml.core.runconfig import MpiConfiguration
ws = Workspace.from_config()
compute_target = ws.compute_targets['powerful-gpu']
environment = Environment.from_conda_specification('pytorch', 'env.yml')
environment.docker.enabled = True
environment.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'
# train on 2 nodes each with 4 GPUs
mpiconfig = MpiConfiguration(process_count_per_node=4, node_count=2)
config = ScriptRunConfig(
source_directory='<path/to/code>', # directory containing train.py
script='train.py',
environment=environment,
arguments=['--learning_rate', 0.001, '--momentum', 0.9],
distributed_job_config=mpiconfig, # add the distributed configuration
)
run = Experiment(ws, 'PyTorch model training').submit(config)
info
  • mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04 is a docker image with OpenMPI. This is required for distributed training on Azure ML.
  • MpiConfiguration is where you specify the number of nodes and GPUs (per node) you want to train on.

For more details: Distributed GPU Training

Connect to data#

To work with data in your training scripts using your workspace ws and its default datastore:

datastore = ws.get_default_datastore()
dataset = Dataset.File.from_files(path=(datastore, '<path/on/datastore>'))

For more details see: Data

Pass this to your training script as a command line argument.

arguments=['--data', dataset.as_mount()]