View on GitHub

bacc

Batch Accelerator

Tutorial: MPI Benchmarks on RHEL

This tutorial is a step-by-step guide on how to run MPI benchmarks on Azure Batch on RHEL 8 compute nodes. This is a good place to start for any one looking into using MPI applications on Azure Batch.

This tutorial uses configuration files from [examples/mpi-benchmarks-rhel] folder. The deployment.bicep is the entry point for this deployment and config.jonc is the configuration file that contains all the resource configuration parameters for this deployment.

Key Design Elements

Step 1: Prerequisites and environment setup

Follow the environment setup instructions to set up your environment. Since this tutorial uses User Subscription pool allocation mode, you will need to follow the steps described in that document.

Step 2: Deploy resources to Azure

For this step, you have two options. You can use Azure CLI to deploy the resources using the bicep template provided. Or you can simply click the following link to deploy using Azure Portal.

Deploy to Azure

Use the following steps to deploy using Azure CLI.

#!/bin/bash

# create deployment
AZ_LOCATION=eastus
AZ_DEPLOYMENT_NAME=mpi-benchmarks0
AZ_RESOURCE_GROUP=mpi-benchmarks0
BATCH_SERVICE_OBJECT_ID= ....  # should be set to the id obtained in prerequisites step

# change directory to bacc (or where you cloned/downloaded the repository)
cd bacc

az deployment sub create                                      \
  --name $AZ_DEPLOYMENT_NAME                                  \
  --location $AZ_LOCATION                                     \
  --template-file examples/mpi-benchmarks/deployment.bicep    \
  --parameters                                                \
      resourceGroupName=$AZ_RESOURCE_GROUP                    \
      batchServiceObjectId=$BATCH_SERVICE_OBJECT_ID

On success, a new resource group will be created with the name specified. This resource group will contain all the resources deployed by this deployment.

Step 3: Install CLI

Next, we install the CLI tool provided by this repository. This tool is used to submit jobs and tasks to the batch account. We recommend using a python virtual environment to install the CLI tool to avoid polluting the global python environment.

Follow the steps described here to install the CLI tool.

Step 4: Verify deployment

Once the deployment is complete, you can verify the deployment by listing the compute nodes in the pool.

#!/bin/bash

# fetch subscription id from the portal or using the following command
# if you have already logged in to Azure CLI using `az login` command
AZ_SUBSCRIPTION_ID=$(az account show --query id -o tsv)

# set resource group name to the one used in Step 2
AZ_RESOURCE_GROUP=azfinsim0 # or whatever you used in Step 2

# use the `bacc show` command
bacc show -s $AZ_SUBSCRIPTION_ID -g $AZ_RESOURCE_GROUP
# on success, you'll get the following output
{
  "acr_name": null,
  "batch_endpoint": "https://......batch.azure.com"
}

# To list all available pools, use the `bacc pool list` command
bacc pool list -s $AZ_SUBSCRIPTION_ID -g $AZ_RESOURCE_GROUP \
    --query "[].{pool_id:id, vm_size:vmSize, state:allocationState}"
# expected output
[
  {
    "pool_id": "almalinux",
    "state": "steady",
    "vm_size": "standard_hb120rs_v3"
  },
  {
    "pool_id": "rhel8",
    "state": "steady",
    "vm_size": "standard_hb120rs_v3"
  }
]

Step 5: Submit jobs

Before we submit jobs, we need to resize the pool to the desired size. This is because the default pool size is set to 0. For this MPI application demo, we need at least 2 nodes. So we resize the pool to 2 nodes.


AZ_POOL_ID=rhel8 # or whichever pool id you want to use

# resize pool
bacc pool resize -s $AZ_SUBSCRIPTION_ID -g $AZ_RESOURCE_GROUP -p $AZ_POOL_ID --target-dedicated-nodes 2
# this will block until the pool is resized and then print the following:
{
  "current_dedicated_nodes": 2,
  "current_low_priority_nodes": 0
}

The resize can take anywhere from 5-10 minutes. During this time, the nodes get provisioned and the startup script is executed. The startup script installs the required software on the nodes. Once the pool is resized, we can submit jobs to the pool.


# submit job
bacc mpi-bm imb -s $AZ_SUBSCRIPTION_ID -g $AZ_RESOURCE_GROUP \
    --num-nodes 2 --num-ranks 2 \
    -p $AZ_POOL_ID -e IMB-MPI1 -a PingPong
# expected output
{
    "job_id": "imb-mpi1-pingpong-[...]",
}

You should now be able to browse to the listed job using Azure Portal or Batch explorer. Once the job has completed, you can inspect (or download) the stdout/stderr files for the tasks to see the output of the benchmark. Following is an example output from one of the tasks.

# stdout.txt

#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.3, MPI-1 part
#----------------------------------------------------------------
# Date                  : Mon Sep 25 12:11:23 2023
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-305.88.1.el8_4.x86_64
# Version               : #1 SMP Thu Apr 6 10:22:46 EDT 2023
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /mnt/intel_benchmarks/hpcx/IMB-MPI1 PingPong 

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT 
# MPI_Op                         :   MPI_SUM  
# 
# 

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         2.24         0.00
            1         1000         2.24         0.45
            2         1000         2.21         0.90
            4         1000         2.23         1.79
            8         1000         2.22         3.60
           16         1000         2.24         7.14
           32         1000         2.26        14.18
           64         1000         2.18        29.38
          128         1000         2.29        55.89
          256         1000         2.76        92.60
          512         1000         2.83       180.93
         1024         1000         2.94       348.49
         2048         1000         3.07       666.66
         4096         1000         3.77      1086.75
         8192         1000         4.28      1912.20
        16384         1000         5.82      2816.48
        32768         1000         7.35      4459.98
        65536          640         9.84      6663.39
       131072          320        14.87      8813.43
       262144          160       127.18      2061.29
       524288           80       136.18      3850.07
      1048576           40       265.94      3942.86
      2097152           20       309.60      6773.80
      4194304           10       398.46     10526.36


# All processes entering MPI_Finalize

Likewise, you can run OSU benchmarks using the following command.


# submit job
bacc mpi-bm osu -s $AZ_SUBSCRIPTION_ID -g $AZ_RESOURCE_GROUP \
  -p $AZ_POOL_ID -e osu_bcast -n 2 -r 64
{
  "job_id": "osu_bcast-[...]"
}

And here’s a sample output from this tasks.

# stdout.txt

# OSU MPI Broadcast Latency Test v7.0
# Size       Avg Latency(us)
1                       3.22
2                       3.23
4                       3.21
8                       3.25
16                      3.27
32                      3.37
64                      3.37
128                     3.69
256                     3.84
512                     4.18
1024                    4.43
2048                    4.86
4096                    6.42
8192                    8.57
16384                  21.99
32768                  13.94
65536                  20.38
131072                 32.46
262144                 64.00
524288                128.05
1048576               253.97