Plato Toolkit documentation

Learn how to train and deploy RL agents at scale with Ray RLlib and Azure Machine Learning (AML).

Overview

Prerequisites
Samples
- Simple Adder: A minimal working example of a Python simulator that can be connected to RLlib and used to train an agent on AML. You can think of it as a "Hello World" sample.
User Guides

Create Azure Resources

To use this toolkit, you'll need the following:

You can also create these using the AML Python SDK.

Selecting a Compute Cluster Size

There is no definitive answer to how to select a compute cluster size for RL, as it depends on many factors (e.g., your project budget, simulation environment, and model architecture). However, some general guidelines are:

Unless you have a compute intensive RL model (e.g., a large deep residual network), we recommend selecting a general purpose CPU VM.
Choose a minimum number of nodes that defines how many nodes are always running and ready for your jobs.

We recommend selecting 0 as your minimum to de-allocate the nodes when they aren't in use. Any value larger than 0 will keep that number of nodes running and incur cost.
Choose a maximum number of nodes that defines how many nodes can be added to scale up your training when needed.
Avoid large unexpected Azure costs by familiarizing yourself with the size and cost of Azure VMs.
If you are still unsure which VM to select, a cluster with 6 CPU cores and 64GB RAM should be a good starting point for most RL workloads using a Python simulation environment. You can also monitor your job's resource utilization in AML studio during experiment runs and adjust your VM size accordingly.

Once you have an AML workspace that contains a compute cluster ready to go, the next step is to set up an AML environment to add your Python package dependencies.

AML Environment Setup

A user-managed AML environment specifies the Python packages required to run your simulation and Ray RLlib code. You can follow the how-to guide on configuring AML environments or try our preferred method below using a conda file.

We've provided a conda.yml file and Azure CLI command that you can use to create an environment for the Simple Adder sample within this toolkit. Simply save the file and run the CLI command from the same location. For more detailed instructions, you can follow the guide to create an environment from a conda file in AML studio or with the AML Python SDK.

# conda.yml
channels:
- anaconda
- conda-forge
dependencies:
- python=3.8.5
- pip=22.3.1
- pip:
   # Dependencies for Ray on AML
   - azureml-mlflow
   - azureml-defaults
   - ray-on-aml
   - ray[data]==2.3.0
   - ray[rllib]==2.3.0
   # Dependencies for RLlib
   - tensorflow==2.11.1
   # Dependencies for the Simulator
   - gymnasium
   - numpy==1.24.2

Azure CLI command: Azure CLI az ml environment create --name aml-environment --conda-file conda.yml --image mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04 --resource-group $YOUR_RESOURCE_GROUP --workspace-name $YOUR_WORKSPACE

Custom Simulation Environment with Gymnasium

Before you can train an RL agent on AML, your simulation environment needs to be compatible with Ray RLlib. For Python simulation environments, we recommend modifying your code to create a custom Gymnasium environment by following this tutorial and using the samples in this repository for reference. The basic steps are:

Implement the gymnasium.Env interface and define methods for reset() and step().
Specify the action_space and observation_space attributes during initialization using gymnasium.spaces.
Ensure that the actions and observations returned by reset() and step() have the same shape and dtype as the action_space and observation_space defined.
- For example, if your observation_space is a gymnasium.spaces.Box space with shape=(1,) and dtype=np.float32, you should make sure that your observation is a numpy array of shape (1,) and dtype np.float32.

After you complete the integration, we suggest that you confirm it can run on your local machine before scaling on AML. Our Simple Adder sample provides you with the code to run it both locally and on AML.