The Team Data Science Process using Azure Machine Learning

This hands-on lab guides you through using the Team Data Science Process using Azure Machine Learning with Azure Machine Learning. We will be using a Customer Churn Analysis example throughout this Workshop (which we will download in a lab below).

In this lab, you will:

You’ll focus on the objectives above, not Data Science, Machine Learning or a difficult scenario.

NOTE: There are several pre-requisites for this course, including an understanding and implementation of:

There is a comprehensive Learning Path you can use to prepare for this course located here.

Introduction and setup

The Primary Concepts for this lab are here. We’ll refer to these throughout the lab.

Please see this file for additional pre-requisites.

Introduction to the TDSP

Image

1. Business Understanding

In the Business Understanding phase of the TDSP, you discover the questions that the organization would like answered from data. This is a group effort, involving the organization, the Data Science team, and the DevOps team along with other stakeholders.

Your scenario is as follows:

The Orange Telecom company in France is one of the largest operators of mobile and internet services in Europe and Africa and a global leader in corporate telecommunication services. They have 256 million customers worldwide. They have significant coverage in France, Spain, Belgium, Poland, Romania, Slovakia Moldova, and a large presence Africa and the Middle East. Customer Churn is always an issue in any company. Orange would like to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). For this effort, they think churn is the first thing they would like to focus on.

In this lab, you will use Azure Machine Learning (AML) to create a solution. The general configuration for working with Azure Machine Learning has these components:

Azure Machine Learning Components

Lab: Set up a generic TDSP Structure using Azure Machine Learning

In this section you’ll set up your project’s structure, conforming to the Team Data Science Process, using Azure Machine Learning.

Lab: Use-case evaluation for Data Science questions

In this section you’ll evaluate a business scenario, and detail possible predictions, classifications, or other data science questions that you can begin to explore.

2. Data Acquisition and Understanding

The Data Aquisition and Understanding phase of the TDSP you ingest or access data from various locations to answer the questions the organization has asked. In most cases, this data will be in multiple locations. Once the data is ingested into the system, you’ll need to examine it to see what it holds. All data needs cleaning, so after the inspection phase, you’ll replace missing values, add and change columns. You’ll cover more extensive Data Wrangling tasks in other labs.

In this section, we’ll use a single file-based dataset to train our model.

Lab: Ingest data from a local source

In this lab you will load the data set, inspect it, make a few changes, and then save the Data Wrangling steps as a Python package.

3. Modeling

The Modeling phase of the Team Data Science Process involves creating experiments using one or more algorithms and base data to create a repeatable prediction or classification.

A view of this process is here, shown on the right side of the Docker graphic:

Image

Lab: Feature Engineering, Modeling, and Scoring

In this lab we’ll use the same project you just created. You’ll create your feature engineering file, run the model training, and create the final scores.

4. Deployment

A view of this process is here, shown on the left side of the Docker graphic:

Image

The Deployment phase of the TDSP entails outputting the results to a data location, creating an Application Programming Interface (API) or another mechanism for the classification or prediction model to be consumed.

Optional Lab: Deploy the solution using Containers, consume the results

In this lab you will deploy the solution locally, and optionally to Docker. NOTE This section takes quite some, so it’s included here for completeness. The instructor will go over it with you.

5. Customer Acceptance

The final step in the Team Data Science Process is Customer Acceptance. Here you focus on ensuring that the model performed within acceptable time and accuracy rates, and also present your findings in a comprehensive project document.

Lab: Review Customer Acceptance and Closeout Documentation

In this lab you will examine the final project close out document. In production implementations, you and your team will create this document.

Lab completion

In this lab you learned how to:

You may now delete and decommission the following resources if you wish: