Glossary

RLlib Terms

  • Action: A decision made by the agent to change the state of the environment.
  • Algorithm: A set of instructions that an agent follows to learn how to behave in an environment by performing actions and receiving feedback (reward/penalty) based on those actions.
  • Agent: The learner and decision maker that interacts with an environment and receives a reward signal based on its actions.
  • Batch: A collection of steps that are used to update a policy.
  • Environment (Simulation): A simulation of a real-world scenario that an agent interacts with.
  • Episode: A sequence of actions taken by an agent from an initial state to either a “success” or “failure” causing the environment to reach its “terminal” state. At each step, the agent receives an observation (i.e., the observable states of the environment), takes an action, and receives a reward.
  • Iteration: A single training call for an RLlib Trainer (calling Trainer.train() once). An iteration may contain one or more episodes (collecting data for the train batch or for a replay buffer), and one or more SGD update steps, depending on the particular Trainer being used. NOTE: In RLlib, iteration should not be confused with the term step.
  • Gymnasium: An open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of example environments compliant with the API.
  • Observation: The part of a state that the agent can observe.
  • Policy: A function mapping of the environment’s observational states to an action to take, usually written π (s(t)) -> a(t).
  • Ray: A distributed computing framework that makes it easy to scale your applications and to leverage state-of-the-art machine learning libraries such as RLlib.
  • Reward: A scalar value that indicates how well the agent is doing at a given step. For each good action, the agent gets positive feedback/reward, and for each bad action, the agent gets negative feedback/reward or penalty.
  • RLlib: An open source Python library that provides scalable and easy-to-use reinforcement learning solutions.
  • Rollout worker (Ray Actor): A process that interacts with an environment and collects trajectories for training.
  • State: A set of information that an agent has about the environment at a given time. States should have the Markov property, which means that knowing the state means you know everything that could determine the response of the environment to a specific action.
  • Step: A single interaction between an agent and an environment, which consists of an observation (i.e., the state of the environment), an action, a reward, and a new observation.

Azure Machine Learning Terms

  • Compute cluster: A managed-compute infrastructure that allows you to easily create a single or multi-node resource for training or inference.
  • Environment (AML): A collection of software dependencies and configurations that are needed to run your reinforcement learning code on AML.
  • Workspace: A top-level resource for your machine learning activities, providing a centralized place to view and manage the artifacts you create when you use AML. A workspace contains your experiments, models, datastores, compute targets, environments, and other resources.