This guide covers the details for deploying models as web services on Azure Machine Learning.

Register your model

A registered model is a logical container for one or more files that make up your model. For example, if you have a model that’s stored in multiple files, you can register them as a single model in the workspace. After you register the files, you can then download or deploy the registered model and receive all the files that you registered.

Machine learning models are registered in your Azure ML workspace. The model can come from Azure ML or from somewhere else. When registering a model, you can optionally provide metadata about the model. The tags and properties that you apply to a model registration can then be used to filter models.

Register a model from an Azure ML run

When you use the SDK to train a model, you will have a corresponding Run object. If your training script wrote the model file(s) to the 'outputs' folder, then you can register a model directly from that training run.

For more information, see the documentation for register_model_from_run()

Register a model from local file(s)

You can register a model by providing the local path of the model. You can provide the path of either a folder or a single file. You can use this method to register models that were trained with Azure ML and then downloaded locally. You can also use this method to register models trained outside of Azure ML.

For more information, see the documentation for register_model()

Choose a compute target

You can use the following compute targets, or compute resources, to host your web service deployment:

Compute target Used for GPU support Description
Local web service Testing/debugging Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system.
Azure ML compute instance web service Testing/debugging Used for limited testing and troubleshooting.
Azure Container Instances (ACI) Testing or development Use for low-scale CPU-based workloads that require less than 48 GB of RAM.
Azure Kubernetes Service (AKS) Real-time inference Yes Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn’t supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal.

Although compute targets like local and Azure Machine Learning compute instance support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on Azure Kubernetes Service.

Prepare deloyment artifacts

To deploy a model, you need the following:

  • Entry script and source code dependencies: This script accepts requests, scores the requests by using the model, and returns the results.
  • Inference environment: The Azure ML environment, which includes the package dependencies required to run the model.
  • Deployment configuration: The configuration for the compute target that hosts the deployed model. It describes things like memory and CPU requirements needed to run the model.

1. Define your entry script and dependencies

To deploy a model, you must provide an entry script (also referred to as the scoring script) that accepts requests, scores the requests by using the model, and returns the results. The entry script is specific to your model. It must understand the format of the incoming request data, the format of the data expected by your model, and the format of the data returned to clients. If the request data is in a format that is not usable by your model, the script can transform it into an acceptable format. It can also transform the response before returning it to the client.

The entry script must contain an init() method that loads your model and then returns a function that uses the model to make a prediction based on the input data passed to the function. Azure ML runs the init() method once, when the Docker container for your web service is started. The prediction function returned by init() will be run every time the service is invoked to make a prediction on some input data. The inputs and outputs of this prediction function typically use JSON for serialization and deserialization.

Locate model files in your entry script

To locate the registered model(s) in your entry script, use the AZUREML_MODEL_DIR environment variable that is created during the service deployment. This environment variable contains the path to the directory that contains the the deployed model(s).

The following table describes the value of AZUREML_MODEL_DIR depending on the number of models deployed:

Deployment Environment variable value
Single model The path to the folder containing the model.
Multiple models The path to the folder containing all models. Models are located by name and version in this folder ($MODEL_NAME/$VERSION)

To get the path to the model file in your entry script, combine the environment variable with the file path you’re looking for.

Single model example

Multiple model example

Example entry script

The following is an example entry script. You can see the full tutorial here.

2. Define your inference environment

You will also need to provide an Azure ML environment (r_environment()) that defines all the dependencies required to execute your scoring script. You can create a new environment for deployment, or use a previously instatiated environment or registered environment.

Then define the inference configuration, which consists of the entry script, the environment, and optionally the directory that contains all the files needed to package and deploy your model (such as helper files for the entry script). See the reference documentation for inference_config().

Note that if you specify the source_directory parameter, the entry script file must be located in that directory, and the value to entry_script should be the relative path of the file inside that directory.

3. Define your deployment configuration

Before deploying your model, you must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. For example, when you deploy a model locally, you must specify the port where the service accepts requests.

The following table provides examples for creating the deployment configuration for each compute target:

Compute target Deployment configuration Example
Local local_webservice_deployment_config() deployment_config <- local_webservice_deployment_config(port = 8890)
Azure Container Instances (ACI) aci_webservice_deployment_config() deployment_config <- aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)
Azure Kubernetes Service (AKS) aks_webservice_deployment_config() deployment_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)

Deploy to target

Finally, deploy your model(s) as a web service to the target of your choice. To deploy the model(s), you will provide the inference configuration and deployment configuration you created in the above steps, in addition to the models you want to deploy, to deploy_model(). If you are deploying to AKS, you will also have to provide the AKS compute target.

1. Local deployment

To deploy a model locally, you need to have Docker installed on your local machine. If you are deploying locally from a compute instance, Docker will already be installed.

For an example of local deployment, see the deploy-to-local sample.

2. ACI deployment

For an example of deploying to ACI, see the train-and-deploy-to-aci vignette.

3. AKS deployment

To deploy a model to AKS, you will first need an AKS cluster for the deployment compute target. You can either

You can instead also create or attach an AKS cluster via the CLI or studio UI.

For an example of deploying to AKS, see the deploy-to-aks vignette.

Troubleshooting deployment

If your service deployment fails, you can use get_webservice_logs() to inspect the detailed Docker engine log messages from your web service deployment. Note that if your initial deployment fails and you want to attempt a new deployment, you will first need to delete the original web service if you want to use the same web service name. You can use the delete_webservice() method.

For a more detailed guide on working around or solving common deployment errors, see Troubleshooting AKS and ACI deployments.

Web service authentication

The easiest way to authenticate to deployed web services is to use key-based authentication, which generates static bearer-type authentication keys that do not need to be refreshed.

AKS deployments additionally support token-based auth.

The primary difference is that keys are static and can be regenerated manually, and tokens need to be refreshed upon expiration.

Authentication method ACI AKS
Key Disabled by default Enabled by default
Token Not available Disabled by default

Key-based authentication

Web services deployed on AKS have key-based auth enabled by default. ACI-deployed services have key-based auth disabled by default, but you can enable it by setting auth_enabled = TRUE when creating the ACI web service. The following is an example of creating an ACI deployment configuration with key-based auth enabled.

To fetch the auth keys, use get_webservice_keys(). To regenerate a key, use the generate_new_webservice_key() function:

Token-based authentication

When you enable token authentication for a web service, users must present an Azure Machine Learning JSON Web Token (JWT) to the web service to access it. The token expires after a specified timeframe and needs to be refreshed to continue making calls.

Token auth is disabled by default when you deploy to AKS. To control token auth, use the token_auth_enabled parameter when you create or update a deployment:

If token authentication is enabled, you can use the get_webservice_token() method to retrieve a JWT. You will need to request a new token by the token’s refresh_after time.

We strongly recommend that you create your Azure ML workspace in the same region as your AKS cluster. To authenticate with a token, the web service will make a call to the region in which your workspace is created. If your workspace’s region is unavailable, then you will not be able to fetch a token for your web service, even if your cluster is in a different region than your workspace. This effectively results in token-based auth being unavailable until your workspace’s region is available again. In addition, the greater the distance between your cluster’s region and your workspace’s region, the longer it will take to fetch a token.

For more information on authentication in Azure ML, see Set up authentication for Azure Machine Learning resources and workflows.

Consume web service

Every deployed web service provides a REST endpoint, so you can create client applications in any programming language. If you’ve enabled key-based authentication for your service, you need to provide a service key as a token in your request header. If you’ve enabled token-based authentication for your service, you need to provide an Azure Machine Learning JSON Web Token (JWT) as a bearer token in your request header.

To get the endpoint for the deployed web service, use the scoring_uri property:


You can also retrieve the schema JSON document after you deploy the service. Use the swagger_uri property from the deployed web service to get the URI to the local web service’s Swagger file:


You can then use the scoring URI and a package such as httr to invoke the web service via request-response consumption.

Optionally, you can use the invoke_webservice() method from azuremlsdk to directly invoke the web service if you have the web service object:

Update web service

To update a web service, use the corresponding update_*() method. You can update the web service to use a new model, a new entry script, or new dependencies that can be specified in an inference configuration.

Clean up resources