Azure Proactive Resiliency Library v2
Tools Glossary GitHub GitHub Issues Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

privateClouds

Summary

RecommendationImpactCategoryAutomation AvailableIn Azure Advisor
Configure Azure Service Health notifications and alerts for Azure VMware SolutionHighMonitoring and AlertingYesNo
Monitor when Azure VMware Solution Private Cloud is reaching the capacity limitMediumMonitoring and AlertingNoNo
Monitor when Azure VMware Solution Cluster Size is approaching the host limitMediumMonitoring and AlertingNoNo
Enable Stretched Clusters for Multi-AZ Availability of the vSAN DatastoreLowHigh AvailabilityYesNo
Configure Azure Monitor Alert warning thresholds for vSAN datastore utilizationHighMonitoring and AlertingYesNo
Configure Syslog in Diagnostic Settings for Azure VMware SolutionHighMonitoring and AlertingNoNo
Monitor CPU Utilization to ensure sufficient resources for workloadsHighMonitoring and AlertingYesNo
Monitor Memory Utilization to ensure sufficient resources for workloadsHighMonitoring and AlertingYesNo
Apply Resource delete lock on the resource group hosting the private cloudHighGovernanceNoNo
Use key autorotation for vSAN datastore customer-managed keysHighSecurityNoNo
Use multiple DNS servers per private FQDN zoneHighHigh AvailabilityNoNo

Details


Configure Azure Service Health notifications and alerts for Azure VMware Solution

Impact:  High Category:  Monitoring and Alerting

APRL GUID:  74fcb9f2-9a25-49a6-8c42-d32851c4afb7

Description:

Ensure Azure Service Health notifications are set for Azure VMware Solution across all used regions and subscriptions. This communicates service/security issues and maintenance activities like host replacements and upgrades, reducing service request submissions.

Potential Benefits:

Prompt mitigation of issues.
Learn More:
Configure Azure Service Health alerts

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Provides a list of Azure VMware Solution resources that don't have one or more service health alerts covering AVS private clouds in the deployed subscription and region pairs.
//full list of private clouds
(resources
| where ['type'] == "microsoft.avs/privateclouds"
| extend locale = tolower(location)
| extend subscriptionId = tolower(subscriptionId)
| project id, name, tags, subscriptionId, locale)
| join kind=leftouter
//Alert ID's that include all incident types filtered by AVS Service Health alerts
((resources
| where type == "microsoft.insights/activitylogalerts"
| extend alertproperties = todynamic(properties)
| where alertproperties.condition.allOf[0].field == "category" and alertproperties.condition.allOf[0].equals == "ServiceHealth"
| where alertproperties.condition.allOf[1].field == "properties.impactedServices[*].ServiceName" and set_has_element(alertproperties.condition.allOf[1].containsAny, "Azure VMware Solution")
| extend locale = strcat_array(split(tolower(alertproperties.condition.allOf[2].containsAny),' '), '')
| mv-expand todynamic(locale)
| where locale != "global"
| project subscriptionId, tostring(locale) )
| union
//Alert ID's that include only some of the incident types after filtering by service health alerts covering AVS private clouds.
(resources
| where type == "microsoft.insights/activitylogalerts"
| extend subscriptionId = tolower(subscriptionId)
| extend alertproperties = todynamic(properties)
| where alertproperties.condition.allOf[0].field == "category" and alertproperties.condition.allOf[0].equals == "ServiceHealth"
| where alertproperties.condition.allOf[2].field == "properties.impactedServices[*].ServiceName" and set_has_element(alertproperties.condition.allOf[2].containsAny, "Azure VMware Solution")
| extend locale = strcat_array(split(tolower(alertproperties.condition.allOf[3].containsAny),' '), '')
| mv-expand todynamic(locale)
| mv-expand alertproperties.condition.allOf[1].anyOf
| extend incidentType = alertproperties_condition_allOf_1_anyOf.equals
| where locale != "global"
| project id, subscriptionId, locale, incidentType
| distinct subscriptionId, tostring(locale), tostring(incidentType)
| summarize incidentTypes=count() by subscriptionId, locale
| where incidentTypes == 5 //only include this subscription, region pair if it includes all the incident types.
| project subscriptionId, locale)) on subscriptionId, locale
| where subscriptionId1 == "" or locale1 == "" or isnull(subscriptionId1) or isnull(locale1)
| project recommendationId = "74fcb9f2-9a25-49a6-8c42-d32851c4afb7", name, id, tags, param1 = "avsServiceHealthAlertsAllIncidentTypesConfigured: False"



Monitor when Azure VMware Solution Private Cloud is reaching the capacity limit

Impact:  Medium Category:  Monitoring and Alerting

APRL GUID:  29d7a115-dfb6-4df1-9205-04824109548f

Description:

Set an alert for when the node count in Azure VMware Solution Private Cloud hits or exceeds 90 hosts, enabling timely planning for a new private cloud.

Potential Benefits:

Proactive capacity planning
Learn More:
Configure and streamline alerts

ARG Query:

Click the Azure Resource Graph tab to view the query

// cannot-be-validated-with-arg


Monitor when Azure VMware Solution Cluster Size is approaching the host limit

Impact:  Medium Category:  Monitoring and Alerting

APRL GUID:  f86355e3-de7c-4dad-8080-1b0b411e66c8

Description:

Alert when the cluster size reaches 14 hosts. Set up periodic alerts for planning new clusters or datastores due to growth, especially from storage needs. Beyond 14 hosts, trigger alerts for each new host addition for proactive resource monitoring.

Potential Benefits:

Proactive resource management
Learn More:
Configure and streamline alerts

ARG Query:

Click the Azure Resource Graph tab to view the query

// cannot-be-validated-with-arg


Enable Stretched Clusters for Multi-AZ Availability of the vSAN Datastore

Impact:  Low Category:  High Availability

APRL GUID:  9ec5b4c8-3dd8-473a-86ee-3273290331b9

Description:

For Azure VMware Solution, enabling Stretched Clusters offers 99.99% SLA, synchronous storage replication (RPO=0), and spreads vSAN datastore across two AZs. Must be done at initial setup, needing double quota due to extension across AZs.

Potential Benefits:

99.99% SLA, 0 RPO, Multi-AZ
Learn More:
Implement high availability
Stretched Clusters

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Provides a list of Azure VMware Solution resources that aren't configured as stretched clusters and in supported regions.
resources
| where ['type'] == "microsoft.avs/privateclouds"
| extend avsproperties = todynamic(properties)
| where avsproperties.availability.strategy != "DualZone"
| where location in ("uksouth", "westeurope", "germanywestcentral", "australiaeast")
| project recommendationId = "9ec5b4c8-3dd8-473a-86ee-3273290331b9", name, id, tags, param1 = "stretchClusters: Disabled"



Configure Azure Monitor Alert warning thresholds for vSAN datastore utilization

Impact:  High Category:  Monitoring and Alerting

APRL GUID:  4232eb32-3241-4049-9e14-9b8005817b56

Description:

Ensure VMware vSAN datastore slack space is maintained for SLA by monitoring storage utilization and setting alerts at 70% and 75% utilization to allow for capacity planning. To expand, add hosts or external storage like Azure Elastic SAN, Azure NetApp Files, if CPU and RAM requirements are met.

Potential Benefits:

Optimized capacity planning for vSAN
Learn More:
Supported metrics and activities

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Provides a list of Azure VMware Solution resources that don't have a vSAN capacity critical alert with a threshold of 75% or a warning capacity of 70%.
(
resources
| where ['type'] == "microsoft.avs/privateclouds"
| extend scopeId = tolower(tostring(id))
| project ['scopeId'], name, id, tags
| join kind=leftouter (
resources
| where type == "microsoft.insights/metricalerts"
| extend alertProperties = todynamic(properties)
| mv-expand alertProperties.scopes
| mv-expand alertProperties.criteria.allOf
| extend scopeId = tolower(tostring(alertProperties_scopes))
| extend metric = alertProperties_criteria_allOf.metricName
| extend threshold = alertProperties_criteria_allOf.threshold
| project scopeId, tostring(metric), toint(['threshold'])
| where metric == "DiskUsedPercentage"
| where threshold == 75
) on scopeId
| where isnull(['threshold'])
| project recommendationId = "4232eb32-3241-4049-9e14-9b8005817b56", name, id, tags, param1 = "vsanCapacityCriticalAlert: isNull or threshold != 75"
)
| union (
resources
| where ['type'] == "microsoft.avs/privateclouds"
| extend scopeId = tolower(tostring(id))
| project ['scopeId'], name, id, tags
| join kind=leftouter (
resources
| where type == "microsoft.insights/metricalerts"
| extend alertProperties = todynamic(properties)
| mv-expand alertProperties.scopes
| mv-expand alertProperties.criteria.allOf
| extend scopeId = tolower(tostring(alertProperties_scopes))
| extend metric = alertProperties_criteria_allOf.metricName
| extend threshold = alertProperties_criteria_allOf.threshold
| project scopeId, tostring(metric), toint(['threshold'])
| where metric == "DiskUsedPercentage"
| where threshold == 70
) on scopeId
| where isnull(['threshold'])
| project recommendationId = "4232eb32-3241-4049-9e14-9b8005817b56", name, id, tags, param1 = "vsanCapacityWarningAlert: isNull or threshold != 70"
)



Configure Syslog in Diagnostic Settings for Azure VMware Solution

Impact:  High Category:  Monitoring and Alerting

APRL GUID:  fa4ab927-bced-429a-971a-53350de7f14b

Description:

Ensure Diagnostic Settings are configured for each private cloud to send syslogs to external sources for analysis and/or archiving. Azure VMware Solution Syslogs contain data for troubleshooting and performance, aiding quicker issue resolution and early detection of issues.

Potential Benefits:

Faster issue resolution, early detection
Learn More:
Manage logs and archives

ARG Query:

Click the Azure Resource Graph tab to view the query

// cannot-be-validated-with-arg


Monitor CPU Utilization to ensure sufficient resources for workloads

Impact:  High Category:  Monitoring and Alerting

APRL GUID:  4ee5d535-c47b-470a-9557-4a3dd297d62f

Description:

Ensure sufficient compute resources to avoid host resource exhaustion in Azure VMware Solution, which utilizes vSphere DRS and HA for dynamic workload resource management. However, sustained CPU utilization over 95% may increase CPU Ready times, impacting workloads.

Potential Benefits:

Avoids resource exhaustion, optimizes performance
Learn More:
Configure and streamline alerts

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Provides a list of Azure VMware Solution resources that don't have a Cluster CPU capacity critical alert with a threshold of 95%.
resources
| where ['type'] == "microsoft.avs/privateclouds"
| extend scopeId = tolower(tostring(id))
| project ['scopeId'], name, id, tags
| join kind=leftouter (
resources
| where type == "microsoft.insights/metricalerts"
| extend alertProperties = todynamic(properties)
| mv-expand alertProperties.scopes
| mv-expand alertProperties.criteria.allOf
| extend scopeId = tolower(tostring(alertProperties_scopes))
| extend metric = alertProperties_criteria_allOf.metricName
| extend threshold = alertProperties_criteria_allOf.threshold
| project scopeId, tostring(metric), toint(['threshold'])
| where metric == "EffectiveCpuAverage"
| where threshold == 95
) on scopeId
| where isnull(['threshold'])
| project recommendationId = "4ee5d535-c47b-470a-9557-4a3dd297d62f", name, id, tags, param1 = "hostCpuCriticalAlert: isNull or threshold != 95"



Monitor Memory Utilization to ensure sufficient resources for workloads

Impact:  High Category:  Monitoring and Alerting

APRL GUID:  029208c8-5186-4a76-8ee8-6e3445fef4dd

Description:

Ensure sufficient memory resources to prevent host resource exhaustion in Azure VMware Solution. It uses vSphere DRS and vSphere HA for dynamic workload management. Yet, continuous memory use over 95% leads to disk swapping, affecting workloads.

Potential Benefits:

Avoids host exhaustion and swapping
Learn More:
Configure and streamline alerts

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Provides a list of Azure VMware Solution resources that don't have a cluster host memory critical alert with a threshold of 95%.
resources
| where ['type'] == "microsoft.avs/privateclouds"
| extend scopeId = tolower(tostring(id))
| project ['scopeId'], name, id, tags
| join kind=leftouter (
resources
| where type == "microsoft.insights/metricalerts"
| extend alertProperties = todynamic(properties)
| mv-expand alertProperties.scopes
| mv-expand alertProperties.criteria.allOf
| extend scopeId = tolower(tostring(alertProperties_scopes))
| extend metric = alertProperties_criteria_allOf.metricName
| extend threshold = alertProperties_criteria_allOf.threshold
| project scopeId, tostring(metric), toint(['threshold'])
| where metric == "UsageAverage"
| where threshold == 95
) on scopeId
| where isnull(['threshold'])
| project recommendationId = "029208c8-5186-4a76-8ee8-6e3445fef4dd", name, id, tags, param1 = "hostMemoryCriticalAlert: isNull or threshold != 95"



Apply Resource delete lock on the resource group hosting the private cloud

Impact:  High Category:  Governance

APRL GUID:  a5ef7c05-c611-4842-9af5-11efdc99123a

Description:

Applying a resource delete lock to the Azure VMware Solution Private Cloud resource group prevents unauthorized or accidental deletion by anyone with contributor access, ensuring the protection and reliability of the Azure VMware Solution Private Cloud.

Potential Benefits:

Prevents accidental deletion
Learn More:
Lock your resources to protect your infrastructure

ARG Query:

Click the Azure Resource Graph tab to view the query

// cannot-be-validated-with-arg


Use key autorotation for vSAN datastore customer-managed keys

Impact:  High Category:  Security

APRL GUID:  e0ac2f57-c8c0-4b8c-a7c8-19e5797828b5

Description:

When using customer-managed keys for encrypting vSAN datastores, leveraging Azure Key Vault for central management and accessing them via a managed identity linked to the private cloud is advised. The expiration of these keys can render the vSAN datastore and its associated workloads inaccessible.

Potential Benefits:

Avoid outages with key auto-rotation
Learn More:
Configure Customer Managed Keys

ARG Query:

Click the Azure Resource Graph tab to view the query

// cannot-be-validated-with-arg


Use multiple DNS servers per private FQDN zone

Impact:  High Category:  High Availability

APRL GUID:  fcc2e257-23af-4c68-aac8-9cc03033c939

Description:

Azure VMware Solution private clouds support up to three DNS servers for a single FQDN, preventing a single DNS server from becoming a point of failure. It's crucial to use multiple DNS servers for on-premises FQDN resolution from each private cloud.

Potential Benefits:

Enhances reliability and avoids failure
Learn More:
Configure DNS forwarder

ARG Query:

Click the Azure Resource Graph tab to view the query

// cannot-be-validated-with-arg