Skip to main content

Prerequisites

This section details the prerequisites for deploying an AKS cluster with support for Remote Direct Memory Access (RDMA) over InfiniBand, including optional configurations for GPUDirect RDMA.

AKS Cluster

An active AKS cluster is required as the foundation for deploying RDMA over InfiniBand capabilities. The cluster serves as the Kubernetes environment where Network Operator and GPU Operator (if using GPUDirect RDMA) will be installed.

  • Requirement: Create an AKS cluster using the Azure Portal or Azure CLI. Ensure the cluster is running a supported Kubernetes version compatible with Network Operator and / or GPU Operator.
  • Configuration: The cluster must be deployed in a region that supports the required VM sizes with RDMA over InfiniBand capabilities.

To create an AKS cluster, use the following Azure CLI command as a starting point:

export AZURE_RESOURCE_GROUP="myResourceGroup"
export AZURE_REGION="eastus"
export CLUSTER_NAME="myAKSCluster"
export NODEPOOL_NAME="ibnodepool"
export NODEPOOL_NODE_COUNT="2"
export NODEPOOL_VM_SIZE="Standard_ND96asr_v4"

az group create \
--name "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}"

az aks create \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--name "${CLUSTER_NAME}" \
--node-count 1 \
--generate-ssh-keys

Additional nodepools will be added in the next step to meet specific hardware requirements.

AKS Nodepools

The AKS cluster requires a dedicated nodepool configured to support RDMA over InfiniBand. For AI workloads leveraging GPUDirect RDMA, GPU support is also necessary.

RequirementRecommended ConfigurationDescription
Minimum NodesAt least 2 nodesEnables cross-node communication for RDMA over InfiniBand; more nodes for scaling
Operating SystemUbuntuWell-supported by NVIDIA drivers and software stack; other OS options may be available
HardwareMellanox ConnectX NICsHigh-performance network interface cards (NICs) for RDMA over InfiniBand support
VM Size (with GPUs)ND-seriesNVIDIA GPU-enabled VMs with InfiniBand support; e.g., Standard_ND96asr_v4 or Standard_ND96isr_H100_v5
VM Size (without GPUs)RDMA capable instances like HBv2, HBv3 or HBv4 series.High performance compute machines with Infiniband support; e.g., Standard_HB120rs_v3
GPUDirect RDMAOptional; requires GPU-enabled VMs (e.g., ND-series with A100 or H100 GPUs)Enables direct GPU-to-GPU communication; omit GPUs for non-GPUDirect RDMA use cases

Register AKS Infiniband Support Feature

To ensure that the machines in the nodepool land on the same physical Infiniband network, you need to register the AKS Infiniband Support feature.

az feature register --name AKSInfinibandSupport --namespace Microsoft.ContainerService
az feature show \
--namespace "Microsoft.ContainerService" \
--name AKSInfinibandSupport

Nodepool with GPUs

GPU Operator Managed GPU Driver

To create an AKS nodepool without GPU Driver installation by AKS and with GPU Operator, use the following command:

az extension add -n aks-preview
az aks nodepool add \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--cluster-name "${CLUSTER_NAME}" \
--name "${NODEPOOL_NAME}" \
--node-count "${NODEPOOL_NODE_COUNT}" \
--node-vm-size "${NODEPOOL_VM_SIZE}" \
--os-sku Ubuntu \
--skip-gpu-driver-install

AKS Managed GPU Driver

To create an AKS nodepool with GPU Driver installation by AKS and without GPU Operator, use the following command:

az aks nodepool add \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--cluster-name "${CLUSTER_NAME}" \
--name "${NODEPOOL_NAME}" \
--node-count "${NODEPOOL_NODE_COUNT}" \
--node-vm-size "${NODEPOOL_VM_SIZE}" \
--os-sku Ubuntu

Nodepool without GPUs

To create an AKS nodepool backed by non-GPU VMs, use the following command:

az aks nodepool add \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--cluster-name "${CLUSTER_NAME}" \
--name "${NODEPOOL_NAME}" \
--node-count "${NODEPOOL_NODE_COUNT}" \
--node-vm-size "${NODEPOOL_VM_SIZE}" \
--os-sku Ubuntu

Appendix

Understanding VM Size Naming Convention of ND-series

Azure VM sizes use a naming convention to indicate their hardware capabilities. The table below explains the components of VM sizes relevant to RDMA over InfiniBand, and GPUDirect RDMA support in AKS, with examples from the ND-series.

ComponentMeaning
NNVIDIA GPU-enabled
DTraining and inference capable
rRDMA capable
aAMD CPUs
sPremium storage capable
vXVersion/generation (e.g., v4, v5)
NumbervCPUs (e.g., 96)
GPUSpecific GPU model (e.g., H100)

Examples

  • Standard_ND96asr_v4: NVIDIA GPUs (N), Training and inference (D), AMD CPUs (a), premium storage (s), RDMA (r), A100 GPUs, 96 vCPUs, version 4 (v4).
  • Standard_ND96isr_H100_v5: NVIDIA GPUs (N), Training and inference (D), RDMA (r), premium storage (s), H100 GPUs, 96 vCPUs, version 5 (v5).