Virtual Machine Scale Sets
The presented resiliency recommendations in this guidance include Virtual Machine Scale Sets, and dependent resources and settings.
Summary of Recommendations
Recommendations Details
VMSS-1 - Deploy VMSS with Flex orchestration mode instead of Uniform
Category: System Efficiency
Impact: Medium
Guidance
Even single instance VMs should be deployed into a scale set using the Flexible orchestration mode to future-proof your application for scaling and availability. Flexible orchestration offers high availability guarantees (up to 1000 VMs) by spreading VMs across fault domains in a region or within an Availability Zone.
Resources
- When to use VMSS instead of VMs
- Azure Well-Architected Framework review - Virtual Machines and Scale Sets
Resource Graph Query
// Azure Resource Graph Query
// Find all zonal VMs that are NOT deployed with Flex orchestration mode
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.orchestrationMode != "Flexible"
| project recommendationId = "vmss-1", name, id, tags, param1 = strcat("orchestrationMode: ", tostring(properties.orchestrationMode))
VMSS-2 - Enable VMSS application health monitoring
Category: Monitoring
Impact: Medium
Guidance
Monitoring your application health is an important signal for managing and upgrading your deployment. Azure Virtual Machine Scale Sets provide support for Rolling Upgrades including Automatic OS-Image Upgrades and Automatic VM Guest Patching, which rely on health monitoring of the individual instances to upgrade your deployment. You can also use Application Health Extension to monitor the application health of each instance in your scale set and perform instance repairs using Automatic Instance Repairs.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find all VMs that do NOT have health monitoring enabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| join kind=leftouter (
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| mv-expand extension=properties.virtualMachineProfile.extensionProfile.extensions
| where extension.properties.type in ( "ApplicationHealthWindows", "ApplicationHealthLinux" )
| project id
) on id
| where id1 == ""
| project recommendationId = "vmss-2", name, id, tags, param1 = "extension: null"
VMSS-3 - Enable Automatic Repair policy
Category: Automation
Impact: High
Guidance
Enabling automatic instance repairs for Azure Virtual Machine Scale Sets helps achieve high availability for applications by maintaining a set of healthy instances. The Application Health extension or Load balancer health probes may find that an instance is unhealthy. Automatic instance repairs will automatically perform instance repairs by deleting the unhealthy instance and creating a new one to replace it.
Grace period is specified in minutes in ISO 8601 format and can be set using the property automaticRepairsPolicy.gracePeriod. Grace period can range between 10 minutes and 90 minutes, and has a default value of 30 minutes.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find all VMs that do NOT have automatic repair policy enabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.automaticRepairsPolicy.enabled == false
| project recommendationId = "vmss-3", name, id, tags, param1 = "automaticRepairsPolicy: Disabled"
VMSS-4 - Configure VMSS Autoscale to custom and configure the scaling metrics
Category: System Efficiency
Impact: High
Recommendation
Use Custom autoscale based on metrics and schedules.
Autoscale is a built-in feature that helps applications perform their best when demand changes. You can choose to scale your resource manually to a specific instance count, or via a custom Autoscale policy that scales based on metric(s) thresholds, or schedule instance count which scales during designated time windows. Autoscale enables your resource to be performant and cost effective by adding and removing instances based on demand.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find VMSS instances associated with autoscale settings when autoscale is disabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| project name, id, tags
| join kind=leftouter (
resources
| where type == "microsoft.insights/autoscalesettings"
| where tostring(properties.targetResourceUri) contains "Microsoft.Compute/virtualMachineScaleSets"
| project id = tostring(properties.targetResourceUri), autoscalesettings = properties
) on id
| where isnull(autoscalesettings) or autoscalesettings.enabled == "false"
| project recommendationId = "vmss-4", name, id, tags, param1 = "autoscalesettings: Disabled"
| order by id asc
VMSS-5 - Enable Predictive autoscale and configure at least for Forecast Only
Category: System Efficiency
Impact: Low
Guidance
Predictive autoscale uses machine learning to help manage and scale Azure Virtual Machine Scale Sets with cyclical workload patterns. It forecasts the overall CPU load to your virtual machine scale set, based on your historical CPU usage patterns. It predicts the overall CPU load by observing and learning from historical usage. This process ensures that scale-out occurs in time to meet the demand.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find VMSS instances associated with autoscale settings when predictiveAutoscalePolicy_scaleMode is disabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| project name, id, tags
| join kind=leftouter (
resources
| where type == "microsoft.insights/autoscalesettings"
| where tostring(properties.targetResourceUri) contains "Microsoft.Compute/virtualMachineScaleSets"
| project id = tostring(properties.targetResourceUri), autoscalesettings = properties
) on id
| where autoscalesettings.enabled == "true" and autoscalesettings.predictiveAutoscalePolicy.scaleMode == "Disabled"
| project recommendationId = "vmss-5", name, id, tags, param1 = "predictiveAutoscalePolicy_scaleMode: Disabled"
| order by id asc
VMSS-6 - Disable Force strictly even balance across zones to avoid scale in and out fail attempts
Category: Availability
Impact: High
Guidance
Microsoft recommends disabling the setting that enforces strictly even distribution of VM instances across Availability Zones within a region in your VMSS configuration. In other words, you should allow Azure to distribute VM instances unevenly across Availability Zones.
Force strictly even balance across zones: Azure provides the option to distribute VM instances in a VMSS evenly across Availability Zones within a region. An Availability Zone is a physically separate data center within an Azure region with independent power, cooling, and networking. This configuration enhances the availability and fault tolerance of your applications.
Scale in and out fail attempts: In the context of VMSS, “scaling in” refers to reducing the number of VM instances when demand decreases, while “scaling out” refers to increasing the number of instances when demand increases. Scaling is an important feature of VMSS, and it can be automatic based on various scaling rules and metrics.
While Azure VMSS provides the option to enforce even distribution of VM instances across Availability Zones for increased resilience, there may be scenarios where disabling this option makes sense to better align with your application’s load distribution and scaling requirements.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find VMSS instances where strictly zoneBalance is set to True
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.orchestrationMode == "Uniform" and properties.zoneBalance == true
| project recommendationId = "vmss-6", name, id, tags, param1 = "strictly zoneBalance: Enabled"
| order by id asc
VMSS-7 - Configure Allocation Policy Spreading algorithm to Max Spreading
Category: System Efficiency
Impact: Medium
Guidance
With max spreading, the scale set spreads your VMs across as many fault domains as possible within each zone. This spreading could be across greater or fewer than five fault domains per zone. With static fixed spreading, the scale set spreads your VMs across exactly five fault domains per zone. If the scale set cannot find five distinct fault domains per zone to satisfy the allocation request, the request fails.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find VMSS instances where Spreading algorithm is set to Static
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.platformFaultDomainCount > 1
| project recommendationId = "vmss-7", name, id, tags, param1 = "platformFaultDomainCount: Static"
| order by id asc
VMSS-8 - Deploy VMSS across availability zones with VMSS Flex
Category: Availability
Impact: High
Guidance
When you create your VMSS, use availability zones to protect your applications and data against unlikely datacenter failure.
Resources
- Create a Virtual Machine Scale Set that uses Availability Zones
- Update scale set to add availability zones
Resource Graph Query
// Azure Resource Graph Query
// Find VMSS instances with one or no Zones selected
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where array_length(zones) <= 1 or isnull(zones)
| project recommendationId = "vmss-8", name, id, tags, param1 = "AvailabilityZones: Single Zone"
| order by id asc
VMSS-9 - Set Patch orchestration options to Azure-orchestrated
Category: Automation
Impact: Low
Guidance
Enabling automatic VM guest patching for your Azure VMs helps ease update management by safely and automatically patching virtual machines to maintain security compliance, while limiting the blast radius of VMs. Note that the KQL below will not return Virtual Machine Scale Sets using Uniform orchestration.
Resources
Resource Graph Query
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| join kind=inner (
resources
| where type == "microsoft.compute/virtualmachines"
| project id = tostring(properties.virtualMachineScaleSet.id), vmproperties = properties
) on id
| extend recommendationId = "vmss-9", param1 = "patchMode: Manual", vmproperties.osProfile.linuxConfiguration.patchSettings.patchMode
| where isnotnull(vmproperties.osProfile.linuxConfiguration) and vmproperties.osProfile.linuxConfiguration.patchSettings.patchMode !in ("AutomaticByPlatform", "AutomaticByOS")
| distinct recommendationId, name, id, param1
| union (resources
| where type == "microsoft.compute/virtualmachinescalesets"
| join kind=inner (
resources
| where type == "microsoft.compute/virtualmachines"
| project id = tostring(properties.virtualMachineScaleSet.id), vmproperties = properties
) on id
| extend recommendationId = "vmss-9", param1 = "patchMode: Manual", vmproperties.osProfile.windowsConfiguration.patchSettings.patchMode
| where isnotnull(vmproperties.osProfile.windowsConfiguration) and vmproperties.osProfile.windowsConfiguration.patchSettings.patchMode !in ("AutomaticByPlatform", "AutomaticByOS")
| distinct recommendationId, name, id, param1)
VMSS-10 - Upgrade VMSS Image versions scheduled to be deprecated or already retired
Category: Governance
Impact: High
Guidance
Ensure the publisher continues to support the OS image avoid disruption or security gaps. Please review the publisher, offer, and sku information of the VM to ensure you are running on a supported image. Please enable automatic guest patching or automatic OS image upgrades to get notifications about image deprecation.
Resources
Resource Graph Query
//cannot be validated with arg
VMSS-11 - Production VMSS instances should be using SSD disks
Category: System Efficiency
Impact: High
Guidance
It is advised that you use SSD disks for Production workloads. Using HDD could impact your resources as it should only be used for non-critical resources and for resources that require infrequent access.
Resources
Resource Graph Query
// Azure Resource Graph Query
// Find all VMSSs Uniform not using SSD storage
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.orchestrationMode != "Flexible"
| where properties.virtualMachineProfile.storageProfile.osDisk.managedDisk.storageAccountType == 'Standard_LRS'
| project recommendationId = "vmss-11", name, id, tags