Developing on Azure ML
This guide gives some pointers for developing your code on Azure ML. A typical scenario might be testing your distributed training code, or some other aspect of your code that isn't well represented on your local devbox.
A common pain-point in these scenarios is that iteration on Azure ML can feel slow - especially when compared to developing on a VM.
Learning objective. To improve the development experience on Azure ML to match - or even exceed - that of a "bare" VM.
#
๐ง The hurdlesTwo main reasons developing on Azure ML can feel slow as compared to a VM are:
Any changes to my Python environment force Docker image rebuild which can take >5 minutes.
Compute resources are released between iterations, forcing me to wait for new compute to warm up (e.g. pulling Docker images).
Below we provide some techniques to address these issues, as well as some advantages to working with Azure ML compute directly. We also provide a example applying these techniques.
#
๐ฐ๏ธ Prepare compute for developmentWhen creating your compute instance / cluster there are a fews things you can do to prepare for development:
Enable SSH on compute.
Supported on both compute instance and compute targets. This will allow you to use your compute just like you would a VM.
VS Code Remote Extension.
VS Code's remote extension allows you to connect to your Azure ML compute resources via SSH. This way you can develop directly in the cloud.
Increase "Idle seconds before scale down".
For compute targets you can increase this parameter e.g. to 30 minutes. This means the cluster won't be released between runs while you iterate.
warning
Don't forget to roll this back when you're done iterating.
#
๐โโ๏ธ CommandsTypically you will submit your code to Azure ML via a ScriptRunConfig
a little like this:
info
For more details on using ScriptRunConfig
to submit your code see
Running Code in the cloud.
By using the command
argument you can improve your agility.
Commands allow you to chain together several steps in one e.g.:
Another example would be to include a setup script:
and then calling it in your command
This way Azure ML doesn't have to rebuild the docker image with incremental changes.
#
AdvantagesIn addition to matching the development experience on a VM, there are certain benefits to developing on Azure ML compute directly.
- Production-ready. By developing directly in Azure ML you avoid the additional step of porting your VM-developed code to Azure ML later. This is particularly relevant if you intend to run your production code on Azure ML.
- Data access. If your training script makes use of data in Azure you can use the Azure ML Python SDK to read it (see Data for examples). The alternative is that you might have to find some way of getting your data onto the VM you are developing on.
- Notebooks. Azure ML's compute insances come with Jupyter notebooks which can help with quick debugging. Moreover, these notebooks can easily be run against different compute infrastructure and can be a great way to collaborate.
#
ExampleWe provide a simple example demonstrating the mechanics of the above steps. Consider the following setup:
Now from your local machine you can use the Azure ML Python SDK to execute your command in the cloud:
Now if you needed to update your Python environment for example you can simply
add commands to setup.sh
:
without having to rebuild any Docker images.