Skip to content

Azure.Pillar.Reliability#

v1.35.0Download CSV

Microsoft Azure Well-Architected Framework - Reliability pillar specific baseline.

Rules#

The following rules are included within the Azure.Pillar.Reliability baseline.

This baseline includes a total of 101 rules.

Name Synopsis Severity Maturity
Azure.ACR.GeoReplica Applications or infrastructure relying on a container image may fail if the registry is not available at the time they start. Important -
Azure.ACR.MinSku The Basic SKU provides limited performance and features for production container registry workloads. Important -
Azure.ADX.SLA Use SKUs that include an SLA when configuring Azure Data Explorer (ADX) clusters. Important -
Azure.AKS.AvailabilityZone AKS clusters deployed with virtual machine scale sets should use availability zones in supported regions for high availability. Important -
Azure.AKS.CNISubnetSize AKS clusters using Azure CNI should use large subnets to reduce IP exhaustion issues. Important -
Azure.AKS.MaintenanceWindow Configure customer-controlled maintenance windows for AKS clusters. Important -
Azure.AKS.MinNodeCount AKS clusters should have minimum number of system nodes for failover and updates. Important -
Azure.AKS.MinUserPoolNodes User node pools in an AKS cluster should have a minimum number of nodes for failover and updates. Important -
Azure.AKS.PoolVersion AKS node pools should match Kubernetes control plane version. Important -
Azure.AKS.UptimeSLA AKS clusters should have Uptime SLA enabled for a financially backed SLA. Important -
Azure.AKS.Version Older versions of Kubernetes may have known bugs or security vulnerabilities, and may have limited support. Important -
Azure.APIM.AvailabilityZone API Management instances should use availability zones in supported regions for high availability. Important -
Azure.APIM.CertificateExpiry Renew certificates used for custom domain bindings. Important -
Azure.APIM.MultiRegion Enhance service availability and resilience by deploying API Management instances across multiple regions. Important -
Azure.APIM.MultiRegionGateway API Management instances should have multi-region deployment gateways enabled. Important -
Azure.AppConfig.GeoReplica Replicate app configuration store across all points of presence for an application. Important -
Azure.AppConfig.PurgeProtect Consider purge protection for app configuration store to ensure store cannot be purged in the retention period. Important -
Azure.AppConfig.SKU App Configuration should use a minimum size of Standard. Important -
Azure.AppGw.AvailabilityZone Application Gateway (App Gateway) should use availability zones in supported regions for improved resiliency. Important -
Azure.AppGw.MigrateWAFPolicy Migrate to Application Gateway WAF policy. Critical -
Azure.AppGw.MinInstance Application Gateways should use a minimum of two instances. Important -
Azure.AppService.AlwaysOn Configure Always On for App Service apps. Important -
Azure.AppService.AvailabilityZone Deploy app service plan instances using availability zones in supported regions to ensure high availability and resilience. Important -
Azure.AppService.PlanInstanceCount App Service Plan should use a minimum number of instances for failover. Important -
Azure.AppService.WebProbe Configure and enable instance health probes. Important -
Azure.AppService.WebProbePath Configure a dedicated path for health probe requests. Important -
Azure.ASE.AvailabilityZone Deploy app service environments using availability zones in supported regions to ensure high availability and resilience. Important -
Azure.AVD.ScheduleAgentUpdate Define a windows for agent updates to minimize disruptions to users. Important -
Azure.ContainerApp.AvailabilityZone Use Container Apps environments that are zone redundant to improve reliability. Important -
Azure.ContainerApp.MinReplicas Use multiple replicas to remove a single point of failure. Important -
Azure.ContainerApp.Storage Use of Azure Files volume mounts to persistent storage container data. Awareness -
Azure.Cosmos.AvailabilityZone Use zone redundant Cosmos DB accounts in supported regions to improve reliability. Important L1
Azure.Cosmos.ContinuousBackup Enable continuous backup on Cosmos DB accounts. Important -
Azure.Cosmos.MongoAvailabilityZone Use zone redundant Cosmos DB vCore clusters in supported regions to improve reliability. Important L1
Azure.Cosmos.SLA Use a paid tier to qualify for a Service Level Agreement (SLA). Important -
Azure.DataFactory.Version Consider migrating to DataFactory v2. Awareness -
Azure.EntraDS.MinReplicas Applications or infrastructure relying on a managed domain may fail if the domain is not available. Important -
Azure.EntraDS.SKU The default SKU for Microsoft Entra Domain Services supports resiliency in a single region. Important -
Azure.EventHub.AvailabilityZone Use zone redundant Event Hub namespaces in supported regions to improve reliability. Important L1
Azure.Firewall.AvailabilityZone Deploy firewall instances using availability zones in supported regions to ensure high availability and resilience. Important -
Azure.FrontDoor.Probe Use health probes to check the health of each backend. Important -
Azure.FrontDoor.ProbeMethod Configure health probes to use HEAD requests to reduce performance overhead. Important -
Azure.FrontDoor.ProbePath Configure a dedicated path for health probe requests. Important -
Azure.Grafana.AvailabilityZone Use zone redundant Grafana workspaces in supported regions to improve reliability. Important L1
Azure.Grafana.Version Grafana workspaces should be on Grafana version 10. Important -
Azure.KeyVault.PurgeProtect Enable Purge Protection on Key Vaults to prevent early purge of vaults and vault items. Important -
Azure.KeyVault.SoftDelete Enable Soft Delete on Key Vaults to protect vaults and vault items from accidental deletion. Important -
Azure.LB.AvailabilityZone Load balancers deployed with Standard SKU should be zone-redundant for high availability. Important -
Azure.LB.Probe Use a specific probe for web protocols. Important -
Azure.LB.StandardSKU Load balancers should be deployed with Standard SKU for production workloads. Important -
Azure.Log.Replication Log Analytics workspaces should have workspace replication enabled to improve service availability. Important -
Azure.MariaDB.GeoRedundantBackup Azure Database for MariaDB should store backups in a geo-redundant storage. Important -
Azure.MICassandra.AvailabilityZone Use zone redundant Managed Instance for Apache Cassandra clusters in supported regions to improve reliability. Important L1
Azure.Monitor.ServiceHealth Configure Service Health alerts to notify administrators. Important -
Azure.MySQL.GeoRedundantBackup Azure Database for MySQL should store backups in a geo-redundant storage. Important -
Azure.MySQL.MaintenanceWindow Configure a customer-controlled maintenance window for Azure Database for MySQL servers. Important -
Azure.MySQL.UseFlexible Use Azure Database for MySQL Flexible Server deployment model. Important -
Azure.MySQL.ZoneRedundantHA Deploy Azure Database for MySQL servers using zone-redundant high availability (HA) in supported regions to ensure high availability and resilience. Important -
Azure.NIC.UniqueDns Network interfaces (NICs) should inherit DNS from virtual networks. Awareness -
Azure.NSG.DenyAllInbound When all inbound traffic is denied, some functions that affect the reliability of your service may not work as expected. Important -
Azure.PostgreSQL.GeoRedundantBackup Azure Database for PostgreSQL should store backups in a geo-redundant storage. Important -
Azure.PostgreSQL.MaintenanceWindow Configure a customer-controlled maintenance window for Azure Database for PostgreSQL servers. Important -
Azure.PostgreSQL.ZoneRedundantHA Deploy Azure Database for PostgreSQL servers using zone-redundant high availability (HA) in supported regions to ensure high availability and resilience. Important -
Azure.PublicIP.AvailabilityZone Public IP addresses deployed with Standard SKU should use availability zones in supported regions for high availability. Important -
Azure.PublicIP.StandardSKU The basic SKU is being retired on 30 September 2025, and does not include several reliability and security features. Important -
Azure.Redis.AvailabilityZone Premium Redis cache should be deployed with availability zones for high availability. Important -
Azure.Redis.Version Azure Cache for Redis should use the latest supported version of Redis. Important -
Azure.RedisEnterprise.Zones Enterprise Redis cache should be zone-redundant for high availability. Important -
Azure.RSV.ReplicationAlert Recovery Services Vaults (RSV) without replication alerts configured may be at risk. Important -
Azure.RSV.StorageType Recovery Services Vaults (RSV) not using geo-replicated storage (GRS) may be at risk. Important -
Azure.Search.IndexSLA Use a minimum of 3 replicas to receive an SLA for query and index updates. Important -
Azure.Search.QuerySLA Use a minimum of 2 replicas to receive an SLA for index queries. Important -
Azure.SignalR.SLA Use SKUs that include an SLA when configuring SignalR Services. Important -
Azure.SQL.MaintenanceWindow Configure a customer-controlled maintenance window for Azure SQL databases. Important -
Azure.SQLMI.MaintenanceWindow Configure a customer-controlled maintenance window for Azure SQL Managed Instances. Important -
Azure.Storage.ContainerSoftDelete Enable container soft delete on Storage Accounts. Important -
Azure.Storage.FileShareSoftDelete Enable soft delete on Storage Accounts file shares. Important -
Azure.Storage.SoftDelete Enable blob soft delete on Storage Accounts. Important -
Azure.Storage.UseReplication Storage Accounts using the LRS SKU are only replicated within a single zone. Important -
Azure.Template.LocationDefault Set the default value for the location parameter within an ARM template to resource group location. Awareness -
Azure.TrafficManager.Endpoints Traffic Manager should use at lest two enabled endpoints. Important -
Azure.VM.ASAlignment Use availability sets aligned with managed disks fault domains. Important -
Azure.VM.ASDistributeTraffic Ensure high availability by distributing traffic among members in an availability set. Important -
Azure.VM.ASMinMembers Availability sets should be deployed with at least two virtual machines (VMs). Important -
Azure.VM.BasicSku Virtual machines (VMs) should not use Basic sizes. Important -
Azure.VM.MaintenanceConfig Use a maintenance configuration for virtual machines. Important -
Azure.VM.Standalone Single instance VMs are a single point of failure, however reliability can be improved by using premium storage. Important -
Azure.VMSS.AutoInstanceRepairs Applications or infrastructure relying on a virtual machine scale sets may fail if VM instances are unhealthy. Important -
Azure.VMSS.AvailabilityZone Deploy virtual machine scale set instances using availability zones in supported regions to ensure high availability and resilience. Important -
Azure.VMSS.ZoneBalance Deploy virtual machine scale set instances using the best-effort zone balance in supported regions. Important -
Azure.VNET.BastionSubnet VNETs with a GatewaySubnet should have an AzureBastionSubnet to allow for out of band remote access to VMs. Important -
Azure.VNET.FirewallSubnetNAT Zonal-deployed Azure Firewalls should consider using an Azure NAT Gateway for outbound access. Awareness -
Azure.VNET.LocalDNS Virtual networks (VNETs) should use DNS servers deployed within the same Azure region. Important -
Azure.VNET.SingleDNS Virtual networks (VNETs) should have at least two DNS servers assigned. Important -
Azure.VNG.ERAvailabilityZoneSKU Use availability zone SKU for virtual network gateways deployed with ExpressRoute gateway type. Important -
Azure.VNG.ERLegacySKU Migrate from legacy SKUs to improve reliability and performance of ExpressRoute (ER) gateways. Critical -
Azure.VNG.MaintenanceConfig Use a customer-controlled maintenance configuration for virtual network gateways. Important -
Azure.VNG.VPNActiveActive Use VPN gateways configured to operate in an Active-Active configuration to reduce connectivity downtime. Important -
Azure.VNG.VPNAvailabilityZoneSKU Use availability zone SKU for virtual network gateways deployed with VPN gateway type. Important -
Azure.VNG.VPNLegacySKU Migrate from legacy SKUs to improve reliability and performance of VPN gateways. Critical -
Azure.WebPubSub.SLA Use SKUs that include an SLA when configuring Web PubSub Services. Important -