Azure Monitor Baseline Alerts
Download AlertsGlossaryGitHubGitHub IssuesToggle Dark/Light/Auto modeToggle Dark/Light/Auto modeToggle Dark/Light/Auto modeBack to homepage

High Performance Compute

Overview

High Performance Compute supports a variety of workloads. Seismic modeling, fluid dynamics, Artificial Intelligence workloads all require a more powerful level of compute, networking, and storage than other traditional workloads. Monitoring these environments is critical to ensure continuity in business. You cannot measure what you do not measure. Monitoring HPC workload infrastructure involves implementing alerts and monitoring for Virtual Machines, Storage and Networking across the stack. Alerting for these resources involve monitoring CPU/GPU utilization, throughput/availability, and stability. In this section we provide alert recommendations for the following HPC centric resources:

  • Virtual Machines
  • Azure Batch Service
  • Azure NetApp Files
  • Azure Blob Storage
  • Azure Managed Lustre Filesystem

Please note that an HPC Landing Zone is built on top of the best practices of the Azure Landing Zone. The approach for broader monitoring and alerting in the context of the Azure Landing Zone can be found here.

Azure CycleCloud Workspace for Slurm

Azure CycleCloud Workspace for Slurm is our HPC Landing Zone accelerator.