An Estimator wraps run configuration information for specifying details of executing an R script. Running an Estimator experiment (using submit_experiment()) will return a ScriptRun object and execute your training script on the specified compute target.

To define the environment to use for training, you can either directly provide the environment-related parameters (e.g. cran_packages, custom_docker_image) to estimator(), or you can provide an Environment object to the environment parameter. For more information on the predefined Docker images that are used for training if custom_docker_image is not specified, see the documentation here.

estimator(
  source_directory,
  compute_target = NULL,
  vm_size = NULL,
  vm_priority = NULL,
  entry_script = NULL,
  script_params = NULL,
  cran_packages = NULL,
  github_packages = NULL,
  custom_url_packages = NULL,
  custom_docker_image = NULL,
  image_registry_details = NULL,
  use_gpu = FALSE,
  environment_variables = NULL,
  shm_size = NULL,
  max_run_duration_seconds = NULL,
  environment = NULL,
  inputs = NULL
)

Arguments

source_directory

A string of the local directory containing experiment configuration and code files needed for the training job.

compute_target

The AmlCompute object for the compute target where training will happen.

vm_size

A string of the VM size of the compute target that will be created for the training job. The list of available VM sizes are listed here. Provide this parameter if you want to create AmlCompute as the compute target at run time, instead of providing an existing cluster to the compute_target parameter. If vm_size is specified, a single-node cluster is automatically created for your run and is deleted automatically once the run completes.

vm_priority

A string of either 'dedicated' or 'lowpriority' to specify the VM priority of the compute target that will be created for the training job. Defaults to 'dedicated'. This takes effect only when the vm_size parameter is specified.

entry_script

A string representing the relative path to the file used to start training.

script_params

A named list of the command-line arguments to pass to the training script specified in entry_script.

cran_packages

A list of cran_package objects to be installed.

github_packages

A list of github_package objects to be installed.

custom_url_packages

A character vector of packages to be installed from local directory or custom URL.

custom_docker_image

A string of the name of the Docker image from which the image to use for training will be built. If not set, a predefined image will be used as the base image. To use an image from a private Docker repository, you will also have to specify the image_registry_details parameter.

image_registry_details

A ContainerRegistry object of the details of the Docker image registry for the custom Docker image.

use_gpu

Indicates whether the environment to run the experiment should support GPUs. If TRUE, a predefined GPU-based Docker image will be used in the environment. If FALSE, a predefined CPU-based image will be used. Predefined Docker images (CPU or GPU) will only be used if the custom_docker_image parameter is not set.

environment_variables

A named list of environment variables names and values. These environment variables are set on the process where the user script is being executed.

shm_size

A string for the size of the Docker container's shared memory block. For more information, see Docker run reference. If not set, a default value of '2g' is used.

max_run_duration_seconds

An integer of the maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value.

environment

The Environment object that configures the R environment where the experiment is executed. This parameter is mutually exclusive with the other environment-related parameters custom_docker_image , image_registry_details, use_gpu, environment_variables, shm_size, cran_packages, github_packages, and custom_url_packages and if set will take precedence over those parameters.

inputs

A list of DataReference objects or DatasetConsumptionConfig objects to use as input.

Value

The Estimator object.

Examples

r_env <- r_environment(name = "r-env",
                       cran_packages = list(cran_package("dplyr"),
                                            cran_package("ggplot2")))
est <- estimator(source_directory = ".",
                 entry_script = "train.R",
                 compute_target = compute_target,
                 environment = r_env)

See also

r_environment(), container_registry(), submit_experiment(), dataset_consumption_config(), cran_package()