SAP on Azure
The presented resiliency recommendations in this guidance include Azure SAP Solution and associated resources and settings.
Refer to -
- Azure Center for SAP Solutions
- Opensource Quality Checks
- Openssource Inventory Checks
Summary of Recommendations
Recommendations Details
SAP-1 - Ensure that each SAP production system is designed for high availability across availability zones
Category: Availability
Impact: High
Guidance
Azure Availability Zones are physically separate locations within each Azure region that are tolerant to local failures. Use availability zones to protect your applications and data against unlikely data center failures. Ensure each single point of failure of each SAP production system is protected with high availability using multiple availability zones. If you cannot deploy across different zones in a region, then refer to Microsoft guidance for High availability deployment options for SAP workload.
Resources
- SAP ACSS Quality Insights
- OpenSource Inventory Checks
- OpenSource Quality Checks
- Move Regional SAP HA to Zonal
- High Availability Deployment Options for SAP
Resource Graph Query
// under-development
SAP-2 - Run SAP application servers on two or more VMs using VMSS Flex
Category: Availability
Impact: High
Guidance
Use Virtual Machines Scale Set (VMSS) with flexible orchestration to distribute the virtual machines across specified zones and within each zone to also distribute VMs across different fault domains within the zone on a best effort basis. Configure VMSS Flex following Microsoft recommendation for SAP workload using the right mode and correct settings. If you aren’t currently using VMSS Flex for SAP application servers and also not using Availability Sets with Fault domain & Update domain distribution, then you should consider moving to VMSS Flex architecture to improve the resiliency posture of your SAP deployment. The following blog post in links below outlines the details on the process of migrating existing SAP workloads that are deployed in an availability set or availability zone to a flexible scale set with FD=1 deployment option.
Resources
- OpenSource Inventory Checks
- Virtual machine Scale Set SAP Deployment Guide
- Considerations for Flexible VM Scale Sets for SAP
- Migrate existing SAP system VMs to VMSS Flex
Resource Graph Query
// Azure Resource Graph Query
// Find all VMs that are not associated with a VMSS Flex instance
resources
| where type =~ 'Microsoft.Compute/virtualMachines'
| where isnull(properties.virtualMachineScaleSet.id)
| project recommendationId="vm-1", name, id, tags
SAP-9 - If using single-instance VMs all OS and data disks must be Premium SSD or Ultra Disk
Category: Availability
Impact: High
Guidance
For single-instance VMs, both OS and data disks must be either Premium SSD or Ultra Disk to achieve the single-instance SLA of 99.9% availability.
Resources
- SAP ACSS Insights
- OpenSource Inventory Checks
- OpenSource Quality Checks
- VM SLA
- SAP Storage Planning Guide
Resource Graph Query
// Azure Resource Graph Query
// Find all VMs that have an attached disk that is not in the Premium or Ultra sku tier.
resources
| where type =~ 'Microsoft.Compute/virtualMachines'
| extend lname = tolower(name)
| join kind=leftouter(resources
| where type =~ 'Microsoft.Compute/disks'
| where not(sku.tier =~ 'Premium') and not(sku.tier =~ 'Ultra')
| extend lname = tolower(tostring(split(managedBy, '/')[8]))
| project lname, name
| summarize disks = make_list(name) by lname) on lname
| where isnotnull(disks)
| project recommendationId = "vm-24", name, id, tags, param1=strcat("AffectedDisks: ", disks)
SAP-14 - Ensure that the data is replicated synchronously (SYNC mode) between the primary and secondary database hosting VM nodes
Category: Availability
Impact: High
Guidance
High availability for databases should be implemented using database native replication technologies and the data should be replicated synchronously that is in SYNC mode from primary database to a stand-by node.
Resources
Resource Graph Query
// under-development
SAP-15 - Ensure that SAP shared file systems are designed for high availability and when possible using availability zones
Category: Availability
Impact: High
Guidance
SAP shared file systems such as /sapmnt, /usr/sap/trans, interfaces should be made highly available.
In case of Azure File Shares, we recommend that you use ZRS (Zone-redundant storage). In case of Azure NetApp Files, we recommend that you use Zonal replication for your volumes.
You should review the results of individual checks on other Azure services to ensure SAP shared file systems are designed to protect from zonal failure: ST-1, ANF-1, ANF-6
Resources
Resource Graph Query
// under-development
SAP-16 - Test high availability solutions thoroughly to ensure fail overs work as expected
Category: Availability
Impact: High
Guidance
Test all high availability solutions thoroughly (including kernel panic in Linux VMs and also fail-back). Include zonal failure scenarios in your testing, the testing should confirm that each layer of your SAP solution including database, central services, application servers and shared file systems is configured correctly for zone redundancy, the solution meets RPO = 0 and the application fails over automatically meeting your RTO. The fail back can be either automatic or manual.
Resources
Resource Graph Query
// under-development
SAP-18 - Remove unwanted location constraints from Linux Pacemaker clusters
Category: Availability
Impact: High
Guidance
When executing a migrate command in a Linux Pacemaker cluster, the system generates a temporary “prefer” location constraint, aiming to move a resource to a specified node. This constraint prioritizes the target node for the resource temporarily without permanently altering the cluster’s configuration.
During planned maintenances and fail over testing, you can leverage the migrate command for temporary resource relocation during maintenance or administrative tasks to ensure minimal disruption. This constraint is not permanent and does not survive reboots or cluster resets. It’s designed for short-term adjustments.
Once the planned task necessitating the resource migration is complete, manually remove the temporary constraint to revert to the cluster’s original resource management policies. This approach allows for controlled resource movement within the cluster, facilitating maintenance while preserving the integrity and efficiency of the cluster’s configuration.
Resources
Resource Graph Query
// under-development
SAP-26 - Secure compute resource capacity for critical VM roles in DR region
Category: Disaster Recovery
Impact: Medium
Guidance
To ensure the availability of compute resources for critical VM roles in a DR region, consider securing capacity either through a warm standby approach or by utilizing Azure’s On-demand Capacity Reservation.
Warm standby involves keeping VMs in the DR region running. On-demand Capacity Reservation, on the other hand, reserves compute capacity without having to run the VMs, allowing you to start them when needed. When DR VMs are not needed, the reserved capacity may safely be used to run other workloads without the risk of losing the capacity to other customers. This strategy guarantees resource availability for your critical workloads in the event of a disaster, balancing cost and readiness.
Resources
Resource Graph Query
// under-development
SAP-27 - Ensure that the production databases are replicated (ASYNC) to DR location using the database vendor’s replication technology
Category: Disaster Recovery
Impact: High
Guidance
The replication of production databases to a DR location using the database vendor’s asynchronous replication technology is a key strategy in ensuring data availability and business continuity.
Resources
Resource Graph Query
// under-development
SAP-28 - SAP components are backed up to DR location using an appropriate backup tool or ASR
Category: Disaster Recovery
Impact: High
Guidance
SAP components such as (A)SCS, application servers, WebDispatchers, etc are backed up to DR location using an appropriate backup tool or ASR.
Resources
Resource Graph Query
// under-development
SAP-29 - SAP shared files systems are replicated or backed up to DR location
Category: Disaster Recovery
Impact: High
Guidance
Ensure that critical SAP shared file systems, such as /sapmnt, /usr/trans and /interfaces are either replicated or backed up for disaster recovery purposes.
Resources
Resource Graph Query
// under-development
SAP-32 - Automate DR infrastructure build or pre-deploy DR resources
Category: Disaster Recovery
Impact: Medium
Guidance
Automate DR infrastructure build (or have pre-deployed DR resources) and SAP service recovery as much as possible.
Resources
Resource Graph Query
// under-development
SAP-33 - Document and test DR procedure ensure it meets RPO and RTO targets
Category: Disaster Recovery
Impact: Medium
Guidance
Create detailed documentation of your DR procedures for each layer of the SAP architecture—database, central services, application servers, and shared file systems. This documentation should include configuration details, failover mechanisms, and step-by-step recovery procedures.
Test a wide range of failure scenarios, including regional outages. Testing should confirm that your DR strategy is robust, meets your RPO and RTO targets, and provides seamless failover across all layers of the SAP architecture.
This will ensure a comprehensive and resilient DR strategy capable of withstanding regional failures and ensuring business continuity.
Resources
Resource Graph Query
// under-development
SAP-34 - Ensure there is a robust monitoring and alerting solution in place for the entire DR solution
Category: Disaster Recovery
Impact: Medium
Guidance
For an SAP solution hosted on Azure, it’s imperative to implement a robust monitoring and alerting solution that comprehensively covers DR of each layer of the SAP architecture. Given the complexity of SAP systems, which span multiple layers using diverse technologies and Azure resources, each with potentially distinct DR replication mechanisms, an appropriate monitoring strategy is crucial. The different layers include database, central services, application, and shared file systems.
Resources
Resource Graph Query
// under-development
SAP-36 - Configure scheduled events notification
Category: Monitor
Impact: High
Guidance
Scheduled events is an Azure Metadata Services that provides proactive notifications about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. You should configure scheduled events for all your critical Azure VMs. Resource agent azure-events-az can also integrate with Pacemaker clusters.
To ensure high availability and service continuity in your Azure VMs, you should configure the azure-events-az resource agent within your Pacemaker clusters. This agent monitors for scheduled Azure maintenance events and can proactively relocate resources for a graceful node shutdown. Configure the agent to monitor specific event types such as Reboot and Redeploy, and enable verbose logging for detailed diagnostics.
In addition, it is also important that you define a procedure on how to react to scheduled events.
Resources
Resource Graph Query
// under-development
SAP-42 - ASCS-Pacemaker (Central Server Instance) Ensure Pacemaker cluster has been setup for SAP ASCS high availability
Category: Availability
Impact: High
Guidance
For the ASCS-Pacemaker (Central Server Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP ASCS high availability.
Resources
Resource Graph Query
// under-development
SAP-45 - ASCS-LB (Central Server Instance) Ensure the load balancer is configured correctly for SAP ASCS High availability
Category: Availability
Impact: High
Guidance
For the ASCS-LB (Central Server Instance), ensure that the load balancer is configured correctly for SAP ASCS high availability.
Resources
Resource Graph Query
// under-development
SAP-46 - DBHANA-Pacemaker (Database Instance) Ensure the Pacemaker cluster has been setup for SAP HANA DB high availability
Category: Availability
Impact: High
Guidance
For the DBHANA-Pacemaker (Database Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP HANA DB high availability.
Resources
Resource Graph Query
// under-development
SAP-49 - DBHANA-LB (Database Instance) Ensure the load balancer is configured correctly for SAP HANA DB High availability
Category: Availability
Impact: High
Guidance
For the DBHANA-LB (Database Instance), make sure the load balancer is configured correctly for SAP HANA DB high availability.
Resources
Resource Graph Query
// under-development