Application and Workload Update

Application and Workload Update #

Overview #

The purpose of this document is to provide steps and ways to migrate the workloads and applications from Gen1 to Gen2 after data migration is completed.

This can be applicable for below migration patterns:

  1. Incremental Copy pattern

  2. Lift and Shift copy pattern

  3. Dual Pipeline pattern

As part of this, we will configure services in workloads used and update the applications to point to Gen2 mount.

NOTE: We will be covering below azure services

Prerequisites #

The migration of data from Gen1 to Gen2 should be completed

How to Configure and Update Azure Databricks #

Applies where Databricks is used for data ingestion to ADLS Gen1.

Before the migration:

  1. Mount configured to Gen1 path

    Sample code showing mount path configured for ADLS Gen1 using service principle:

    image

  2. Set up DataBricks cluster for scheduled job run

    Sample snapshot of working code:

    image

    Note: Refer to Application\IncrementalSampleLoad.py script for more details.

After the migration:

  1. Change the mount configuration to Gen2 container

    image

    Note: Stop the job scheduler and change the mount configuration to point to Gen2 with the same mount name.

    image

    Note: Refer to Application\MountConfiguration.py script for more details.

  2. Reschedule the job scheduler

  3. Check for the new files getting generated at the Gen2 root folder path

How to Configure and Update Azure Datafactory #

Once the data migration using ADF is completed from ADLS Gen1 to Gen2, follow the below steps:

  1. Stop the trigger to Gen1 used as part of Incremental copy pattern.

  2. Modify the existing factory by creating new linked service to point to Gen2 storage.

    Go to –> Azure Data Factory –> Click on Author –> Connections –> Linked Service –> click on New –> Choose Azure Data Lake Storage Gen2 –> Click on Continue button

    image

    Provide the details to create new Linked service to point to Gen2 storage account.

    image

  3. Modify the existing factory by creating new dataset in Gen2 storage.

    Go to –> Azure Data Factory –> Click on Author –> Click on Pipelines –> Select the pipeline –> Click on Activity –> Click on sink tab –> Choose the dataset to point to Gen2

    image

  4. Click on Publish all

    image

  5. Go to Triggers and activate it.

    image

  6. Check for the new files getting generated at the Gen2 root folder path

How to Configure and update HDInsight #

Applies where HDInsight is used as workload to process the Raw data and execute the transformations. Below is the step by step process used as part of Dual pipeline pattern.

Prerequisite

Two HDInsight clusters to be created for each Gen1 and Gen2 storage.

Before Migration

The Hive script is mounted to Gen1 endpoint as shown below:

image

After Migration

The Hive script is mounted to Gen2 endpoint as shown below:

image

Once all the existing data is moved from Gen1 to Gen2, Start running the worloads at Gen2 endpoint.

How to configure and update Azure Synapse Analytics #

Applies to the data pipelines having Azure synapse analytics formerly called as Azure SQL DW as one of the workloads. Below is the step by step process used as part of Dual pipeline pattern

Before Migration

The stored procedure activity is pointed to Gen1 mount path.

image

After Migration

The stored procedure activity is pointed to Gen2 endpoint.

image

Run the trigger

image

Check the SQL table in the Data warehouse for new data load.

Cutover from Gen1 to Gen2 #

After you’re confident that your applications and workloads are stable on Gen2, you can begin using Gen2 to satisfy your business scenarios. Turn off any remaining pipelines that are running on Gen1 and decommission your Gen1 account.