Skip to main content

Debugging

Azure ML Log Files#

Azure ML's log files are an essential resource for debugging your Azure ML workloads.

Log fileDescription
20_image_build_log*.txtDocker build logs. Only applicable when updating your Environment. Otherwise Azure ML will reuse cached image.

If successful, contains image registry details for the corresponding image.
55_azureml-execution*.txtPulls image to compute target. Note, this log only appears once you have secured compute resources.
65_job_prep*.txtJob preparation: Download your code to compute target and datastores (if requested).
70_driver_log.txtThe standard output from your script. This is where your code's logs (e.g. print statements) show up.

In the majority of cases you will monitor the logs here.
75_job_post*.txtJob release: Send logs, release the compute resources back to Azure.
info

You will not necessarily see every file for every run. For example, the 20_image_build_log*.txt only appears when a new image is built (e.g. when you change you environment).

Find logs in the Studio#

These log files are available via the Studio UI at https://ml.azure.com under Workspace > Experiment > Run > "Outputs and logs".

Streaming logs#

It is also possible to stream these logs directly to your local terminal using a Run object, for example:

from azureml.core import Workspace, Experiment, ScriptRunConfig
ws = Workspace.from_config()
config = ScriptRunConfig(...)
run = Experiment(ws, 'my-amazing-experiment').submit(config)
run.wait_for_completion(show_output=True)

SSH#

It can be useful to SSH into your compute for a variety of reasons - including to assist in debugging.

Enable SSH at compute creation

SSH needs to be enabled when you create the compute instance / target - see Compute Targets for details.

  1. Get public ip and port number for your compute.

    Visit ml.azure.com > select "Compute" tab > Locate the desired compute instance / target.

    Note. The compute needs to be running in order to connect.

    • In the case of compute instance this just requires turning it on.
    • For compute targets there should be something running on the cluster. In this case you can select the "Nodes" tab of the cluster (ml.azure.com > Compute > your compute target > Nodes) to get Public IP & port number for each node.
  2. Open your favorite shell and run:

    ssh azureuser@<public-ip> -p <port-number>
SSH key pair using RSA

We recommend setting up SSH public-private key pair: see here for more details.