Virtual Machine Scale Sets


The presented resiliency recommendations in this guidance include Virtual Machine Scale Sets, and dependent resources and settings.

Summary of Recommendations

Recommendations Details

VMSS-1 - Deploy VMSS with Flex orchestration mode instead of Uniform

Category: System Efficiency

Impact: Medium

Guidance

Even single instance VMs should be deployed into a scale set using the Flexible orchestration mode to future-proof your application for scaling and availability. Flexible orchestration offers high availability guarantees (up to 1000 VMs) by spreading VMs across fault domains in a region or within an Availability Zone.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find all zonal VMs that are NOT deployed with Flex orchestration mode
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.orchestrationMode != "Flexible"
| project recommendationId = "vmss-1", name, id, tags, param1 = strcat("orchestrationMode: ", tostring(properties.orchestrationMode))



VMSS-2 - Enable VMSS application health monitoring

Category: Monitoring

Impact: Medium

Guidance

Monitoring your application health is an important signal for managing and upgrading your deployment. Azure Virtual Machine Scale Sets provide support for Rolling Upgrades including Automatic OS-Image Upgrades and Automatic VM Guest Patching, which rely on health monitoring of the individual instances to upgrade your deployment. You can also use Application Health Extension to monitor the application health of each instance in your scale set and perform instance repairs using Automatic Instance Repairs.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find all VMs that do NOT have health monitoring enabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| join kind=leftouter  (
    resources
    | where type == "microsoft.compute/virtualmachinescalesets"
    | mv-expand extension=properties.virtualMachineProfile.extensionProfile.extensions
    | where extension.properties.type in ( "ApplicationHealthWindows", "ApplicationHealthLinux" )
    | project id
) on id
| where id1 == ""
| project recommendationId = "vmss-2", name, id, tags, param1 = "extension: null"



VMSS-3 - Enable Automatic Repair policy

Category: Automation

Impact: High

Guidance

Enabling automatic instance repairs for Azure Virtual Machine Scale Sets helps achieve high availability for applications by maintaining a set of healthy instances. The Application Health extension or Load balancer health probes may find that an instance is unhealthy. Automatic instance repairs will automatically perform instance repairs by deleting the unhealthy instance and creating a new one to replace it.

Grace period is specified in minutes in ISO 8601 format and can be set using the property automaticRepairsPolicy.gracePeriod. Grace period can range between 10 minutes and 90 minutes, and has a default value of 30 minutes.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find all VMs that do NOT have automatic repair policy enabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.automaticRepairsPolicy.enabled == false
| project recommendationId = "vmss-3", name, id, tags, param1 = "automaticRepairsPolicy: Disabled"



VMSS-4 - Configure VMSS Autoscale to custom and configure the scaling metrics

Category: System Efficiency

Impact: High

Recommendation

Use Custom autoscale based on metrics and schedules.

Autoscale is a built-in feature that helps applications perform their best when demand changes. You can choose to scale your resource manually to a specific instance count, or via a custom Autoscale policy that scales based on metric(s) thresholds, or schedule instance count which scales during designated time windows. Autoscale enables your resource to be performant and cost effective by adding and removing instances based on demand.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find VMSS instances associated with autoscale settings when autoscale is disabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| project name, id, tags
| join kind=leftouter  (
    resources
    | where type == "microsoft.insights/autoscalesettings"
    | where tostring(properties.targetResourceUri) contains "Microsoft.Compute/virtualMachineScaleSets"
    | project id = tostring(properties.targetResourceUri), autoscalesettings = properties
) on id
| where isnull(autoscalesettings) or autoscalesettings.enabled == "false"
| project recommendationId = "vmss-4", name, id, tags, param1 = "autoscalesettings: Disabled"
| order by id asc



VMSS-5 - Enable Predictive autoscale and configure at least for Forecast Only

Category: System Efficiency

Impact: Low

Guidance

Predictive autoscale uses machine learning to help manage and scale Azure Virtual Machine Scale Sets with cyclical workload patterns. It forecasts the overall CPU load to your virtual machine scale set, based on your historical CPU usage patterns. It predicts the overall CPU load by observing and learning from historical usage. This process ensures that scale-out occurs in time to meet the demand.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find VMSS instances associated with autoscale settings when predictiveAutoscalePolicy_scaleMode is disabled
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| project name, id, tags
| join kind=leftouter  (
    resources
    | where type == "microsoft.insights/autoscalesettings"
    | where tostring(properties.targetResourceUri) contains "Microsoft.Compute/virtualMachineScaleSets"
    | project id = tostring(properties.targetResourceUri), autoscalesettings = properties
) on id
| where autoscalesettings.enabled == "true" and autoscalesettings.predictiveAutoscalePolicy.scaleMode == "Disabled"
| project recommendationId = "vmss-5", name, id, tags, param1 = "predictiveAutoscalePolicy_scaleMode: Disabled"
| order by id asc



VMSS-6 - Disable Force strictly even balance across zones to avoid scale in and out fail attempts

Category: Availability

Impact: High

Guidance

Microsoft recommends disabling the setting that enforces strictly even distribution of VM instances across Availability Zones within a region in your VMSS configuration. In other words, you should allow Azure to distribute VM instances unevenly across Availability Zones.

Force strictly even balance across zones: Azure provides the option to distribute VM instances in a VMSS evenly across Availability Zones within a region. An Availability Zone is a physically separate data center within an Azure region with independent power, cooling, and networking. This configuration enhances the availability and fault tolerance of your applications.

Scale in and out fail attempts: In the context of VMSS, “scaling in” refers to reducing the number of VM instances when demand decreases, while “scaling out” refers to increasing the number of instances when demand increases. Scaling is an important feature of VMSS, and it can be automatic based on various scaling rules and metrics.

While Azure VMSS provides the option to enforce even distribution of VM instances across Availability Zones for increased resilience, there may be scenarios where disabling this option makes sense to better align with your application’s load distribution and scaling requirements.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find VMSS instances where strictly zoneBalance is set to True
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.orchestrationMode == "Uniform" and properties.zoneBalance == true
| project recommendationId = "vmss-6", name, id, tags, param1 = "strictly zoneBalance: Enabled"
| order by id asc



VMSS-7 - Configure Allocation Policy Spreading algorithm to Max Spreading

Category: System Efficiency

Impact: Medium

Guidance

With max spreading, the scale set spreads your VMs across as many fault domains as possible within each zone. This spreading could be across greater or fewer than five fault domains per zone. With static fixed spreading, the scale set spreads your VMs across exactly five fault domains per zone. If the scale set cannot find five distinct fault domains per zone to satisfy the allocation request, the request fails.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find VMSS instances where Spreading algorithm is set to Static
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.platformFaultDomainCount > 1
| project recommendationId = "vmss-7", name, id, tags, param1 = "platformFaultDomainCount: Static"
| order by id asc



VMSS-8 - Deploy VMSS across availability zones with VMSS Flex

Category: Availability

Impact: High

Guidance

When you create your VMSS, use availability zones to protect your applications and data against unlikely datacenter failure.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find VMSS instances with one or no Zones selected
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where array_length(zones) <= 1 or isnull(zones)
| project recommendationId = "vmss-8", name, id, tags, param1 = "AvailabilityZones: Single Zone"
| order by id asc



VMSS-9 - Set Patch orchestration options to Azure-orchestrated

Category: Automation

Impact: Low

Guidance

Enabling automatic VM guest patching for your Azure VMs helps ease update management by safely and automatically patching virtual machines to maintain security compliance, while limiting the blast radius of VMs. Note that the KQL below will not return Virtual Machine Scale Sets using Uniform orchestration.

Resources

Resource Graph Query

resources
| where type == "microsoft.compute/virtualmachinescalesets"
| join kind=inner (
    resources
    | where type == "microsoft.compute/virtualmachines"
    | project id = tostring(properties.virtualMachineScaleSet.id), vmproperties = properties
) on id
| extend recommendationId = "vmss-9", param1 = "patchMode: Manual", vmproperties.osProfile.linuxConfiguration.patchSettings.patchMode
| where isnotnull(vmproperties.osProfile.linuxConfiguration) and vmproperties.osProfile.linuxConfiguration.patchSettings.patchMode !in ("AutomaticByPlatform", "AutomaticByOS")
| distinct recommendationId, name, id, param1
| union (resources
| where type == "microsoft.compute/virtualmachinescalesets"
| join kind=inner (
    resources
    | where type == "microsoft.compute/virtualmachines"
    | project id = tostring(properties.virtualMachineScaleSet.id), vmproperties = properties
) on id
| extend recommendationId = "vmss-9", param1 = "patchMode: Manual", vmproperties.osProfile.windowsConfiguration.patchSettings.patchMode
| where isnotnull(vmproperties.osProfile.windowsConfiguration) and vmproperties.osProfile.windowsConfiguration.patchSettings.patchMode !in ("AutomaticByPlatform", "AutomaticByOS")
| distinct recommendationId, name, id, param1)



VMSS-10 - Upgrade VMSS Image versions scheduled to be deprecated or already retired

Category: Governance

Impact: High

Guidance

Ensure the publisher continues to support the OS image avoid disruption or security gaps. Please review the publisher, offer, and sku information of the VM to ensure you are running on a supported image. Please enable automatic guest patching or automatic OS image upgrades to get notifications about image deprecation.

Resources

Resource Graph Query

//cannot be validated with arg



VMSS-11 - Production VMSS instances should be using SSD disks

Category: System Efficiency

Impact: High

Guidance

It is advised that you use SSD disks for Production workloads. Using HDD could impact your resources as it should only be used for non-critical resources and for resources that require infrequent access.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find all VMSSs Uniform not using SSD storage
resources
| where type == "microsoft.compute/virtualmachinescalesets"
| where properties.orchestrationMode != "Flexible"
| where properties.virtualMachineProfile.storageProfile.osDisk.managedDisk.storageAccountType == 'Standard_LRS'
| project recommendationId = "vmss-11", name, id, tags