Example 3 - Image classification

This example is built upon an AML pipeline demo, which does not use shrike.

Problem statement

In order to show the example, we will be training a model that is able to classify objects belonging to a list of categories. We will build our own dataset, train a model on it, and deploy the result to a webservice. One can then use the trained pipeline endpoint for inference. More specifically, the pipeline will be split into the following steps.

Step 1: Data Ingestion

Input: Blob datastore reference.

Output: Reference to directory containing the raw data.

This step will leverage Bing Image Search REST API to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the CIFAR-10 dataset (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the input datastore reference.

Step 2: Preprocess Data

Input: Reference to directory containing the raw data.

Outputs: Reference to training data directory, reference to validation data directory, reference to testing data directory.

This step will take the raw data downloaded from the previous step and preprocess it by cropping it to a consistent size, shuffling the data, and splitting it into train, valid, and test directories.

Step 3: Train Model

Inputs: Reference to training data directory, reference to validation data directory.

Output: Reference to the directory that trained model is saved to.

This step will fine-tune a RESNET-18 model on our dataset using PyTorch. It will use the corresponding input image directories as training and validation data.

Step 4: Evaluate Model

Inputs: Reference to the directory that trained model was saved to, reference to testing data directory.

Output: Reference to a file storing the testing accuracy of the model

This step evaluates the trained model on the testing data and outputs the accuracy.

Step 5: Deploy Model

Inputs: Reference to the directory that trained model was saved to, reference to the file storing the testing accuracy of the model, reference to testing data directory.

Output: Reference to a file storing the endpoint url for the deployed model.

This step registers and deploys a new model on its first run. In subsequent runs, it will only register and deploy a new model if the training dataset has changed or the dataset did not change, but the accuracy improved.

Configure the example

Create a Bing Image Search API Key for data ingestion. This can be done following these instructions.

Then, fill in the API key to ./examples/components/CIFAR_10_data_ingestion/bing_subscription.yaml file in the data ingestion component.

Submit the example

To submit your experiment with the parameter value defined in the config file, just run the command shown at the top of the experiment python file.

python pipelines/experiments/demo_image_classification.py
--config-dir pipelines/config
--config-name experiments/demo_image_classification 
run.submit=True

Links to successful execution

A successful run of the experiment can be found here. (This is mostly for internal use, as you likely will not have access to that workspace.)

The final pipeline should look something like this.

Online inference using the trained model

Once the pipeline run finishes successfully, you can perform inference by running script ./examples/inference/demo_test_image_classification_endpoint.py in the inference folder.

Here is a sample of an airplane:

python pipelines/experiments/demo_test_image_classification_endpoint.py
--image_url https://compote.slate.com/images/222e0b84-f164-4fb1-90e7-d20bc27acd8c.jpg

If you want to use a custom image:

python pipelines/experiments/demo_test_image_classification_endpoint.py
--image_url --image_url <URL OF IMAGE>