SAP on Azure


The presented resiliency recommendations in this guidance include Azure SAP Solution and associated resources and settings.

Refer to -

  • Azure Center for SAP Solutions
  • Opensource Quality Checks
  • Openssource Inventory Checks

Summary of Recommendations

Recommendations Details

SAP-1 - Ensure that each SAP production system is designed for high availability across availability zones

Category: Availability

Impact: High

Guidance

Azure Availability Zones are physically separate locations within each Azure region that are tolerant to local failures. Use availability zones to protect your applications and data against unlikely data center failures. Ensure each single point of failure of each SAP production system is protected with high availability using multiple availability zones. If you cannot deploy across different zones in a region, then refer to Microsoft guidance for High availability deployment options for SAP workload.

Resources

Resource Graph Query

// under-development



SAP-2 - Run SAP application servers on two or more VMs using VMSS Flex

Category: Availability

Impact: High

Guidance

Use Virtual Machines Scale Set (VMSS) with flexible orchestration to distribute the virtual machines across specified zones and within each zone to also distribute VMs across different fault domains within the zone on a best effort basis. Configure VMSS Flex following Microsoft recommendation for SAP workload using the right mode and correct settings. If you aren’t currently using VMSS Flex for SAP application servers and also not using Availability Sets with Fault domain & Update domain distribution, then you should consider moving to VMSS Flex architecture to improve the resiliency posture of your SAP deployment. The following blog post in links below outlines the details on the process of migrating existing SAP workloads that are deployed in an availability set or availability zone to a flexible scale set with FD=1 deployment option.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find all VMs that are not associated with a VMSS Flex instance
resources
| where type =~ 'Microsoft.Compute/virtualMachines'
| where isnull(properties.virtualMachineScaleSet.id)
| project recommendationId="vm-1", name, id, tags



SAP-9 - If using single-instance VMs all OS and data disks must be Premium SSD or Ultra Disk

Category: Availability

Impact: High

Guidance

For single-instance VMs, both OS and data disks must be either Premium SSD or Ultra Disk to achieve the single-instance SLA of 99.9% availability.

Resources

Resource Graph Query

// Azure Resource Graph Query
// Find all VMs that have an attached disk that is not in the Premium or Ultra sku tier.

resources
| where type =~ 'Microsoft.Compute/virtualMachines'
| extend lname = tolower(name)
| join kind=leftouter(resources
    | where type =~ 'Microsoft.Compute/disks'
    | where not(sku.tier =~ 'Premium') and not(sku.tier =~ 'Ultra')
    | extend lname = tolower(tostring(split(managedBy, '/')[8]))
    | project lname, name
    | summarize disks = make_list(name) by lname) on lname
| where isnotnull(disks)
| project recommendationId = "vm-24", name, id, tags, param1=strcat("AffectedDisks: ", disks)



SAP-14 - Ensure that the data is replicated synchronously (SYNC mode) between the primary and secondary database hosting VM nodes

Category: Availability

Impact: High

Guidance

High availability for databases should be implemented using database native replication technologies and the data should be replicated synchronously that is in SYNC mode from primary database to a stand-by node.

Resources

Resource Graph Query

// under-development



SAP-15 - Ensure that SAP shared file systems are designed for high availability and when possible using availability zones

Category: Availability

Impact: High

Guidance

SAP shared file systems such as /sapmnt, /usr/sap/trans, interfaces should be made highly available.

In case of Azure File Shares, we recommend that you use ZRS (Zone-redundant storage). In case of Azure NetApp Files, we recommend that you use Zonal replication for your volumes.

You should review the results of individual checks on other Azure services to ensure SAP shared file systems are designed to protect from zonal failure: ST-1, ANF-1, ANF-6

Resources

Resource Graph Query

// under-development



SAP-16 - Test high availability solutions thoroughly to ensure fail overs work as expected

Category: Availability

Impact: High

Guidance

Test all high availability solutions thoroughly (including kernel panic in Linux VMs and also fail-back). Include zonal failure scenarios in your testing, the testing should confirm that each layer of your SAP solution including database, central services, application servers and shared file systems is configured correctly for zone redundancy, the solution meets RPO = 0 and the application fails over automatically meeting your RTO. The fail back can be either automatic or manual.

Resources

Resource Graph Query

// under-development



SAP-18 - Remove unwanted location constraints from Linux Pacemaker clusters

Category: Availability

Impact: High

Guidance

When executing a migrate command in a Linux Pacemaker cluster, the system generates a temporary “prefer” location constraint, aiming to move a resource to a specified node. This constraint prioritizes the target node for the resource temporarily without permanently altering the cluster’s configuration.

During planned maintenances and fail over testing, you can leverage the migrate command for temporary resource relocation during maintenance or administrative tasks to ensure minimal disruption. This constraint is not permanent and does not survive reboots or cluster resets. It’s designed for short-term adjustments.

Once the planned task necessitating the resource migration is complete, manually remove the temporary constraint to revert to the cluster’s original resource management policies. This approach allows for controlled resource movement within the cluster, facilitating maintenance while preserving the integrity and efficiency of the cluster’s configuration.

Resources

Resource Graph Query

// under-development



SAP-26 - Secure compute resource capacity for critical VM roles in DR region

Category: Disaster Recovery

Impact: Medium

Guidance

To ensure the availability of compute resources for critical VM roles in a DR region, consider securing capacity either through a warm standby approach or by utilizing Azure’s On-demand Capacity Reservation.

Warm standby involves keeping VMs in the DR region running. On-demand Capacity Reservation, on the other hand, reserves compute capacity without having to run the VMs, allowing you to start them when needed. When DR VMs are not needed, the reserved capacity may safely be used to run other workloads without the risk of losing the capacity to other customers. This strategy guarantees resource availability for your critical workloads in the event of a disaster, balancing cost and readiness.

Resources

Resource Graph Query

// under-development



SAP-27 - Ensure that the production databases are replicated (ASYNC) to DR location using the database vendor’s replication technology

Category: Disaster Recovery

Impact: High

Guidance

The replication of production databases to a DR location using the database vendor’s asynchronous replication technology is a key strategy in ensuring data availability and business continuity.

Resources

Resource Graph Query

// under-development



SAP-28 - SAP components are backed up to DR location using an appropriate backup tool or ASR

Category: Disaster Recovery

Impact: High

Guidance

SAP components such as (A)SCS, application servers, WebDispatchers, etc are backed up to DR location using an appropriate backup tool or ASR.

Resources

Resource Graph Query

// under-development



SAP-29 - SAP shared files systems are replicated or backed up to DR location

Category: Disaster Recovery

Impact: High

Guidance

Ensure that critical SAP shared file systems, such as /sapmnt, /usr/trans and /interfaces are either replicated or backed up for disaster recovery purposes.

Resources

Resource Graph Query

// under-development



SAP-32 - Automate DR infrastructure build or pre-deploy DR resources

Category: Disaster Recovery

Impact: Medium

Guidance

Automate DR infrastructure build (or have pre-deployed DR resources) and SAP service recovery as much as possible.

Resources

Resource Graph Query

// under-development



SAP-33 - Document and test DR procedure ensure it meets RPO and RTO targets

Category: Disaster Recovery

Impact: Medium

Guidance

Create detailed documentation of your DR procedures for each layer of the SAP architecture—database, central services, application servers, and shared file systems. This documentation should include configuration details, failover mechanisms, and step-by-step recovery procedures.

Test a wide range of failure scenarios, including regional outages. Testing should confirm that your DR strategy is robust, meets your RPO and RTO targets, and provides seamless failover across all layers of the SAP architecture.

This will ensure a comprehensive and resilient DR strategy capable of withstanding regional failures and ensuring business continuity.

Resources

Resource Graph Query

// under-development



SAP-34 - Ensure there is a robust monitoring and alerting solution in place for the entire DR solution

Category: Disaster Recovery

Impact: Medium

Guidance

For an SAP solution hosted on Azure, it’s imperative to implement a robust monitoring and alerting solution that comprehensively covers DR of each layer of the SAP architecture. Given the complexity of SAP systems, which span multiple layers using diverse technologies and Azure resources, each with potentially distinct DR replication mechanisms, an appropriate monitoring strategy is crucial. The different layers include database, central services, application, and shared file systems.

Resources

Resource Graph Query

// under-development



SAP-36 - Configure scheduled events notification

Category: Monitor

Impact: High

Guidance

Scheduled events is an Azure Metadata Services that provides proactive notifications about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. You should configure scheduled events for all your critical Azure VMs. Resource agent azure-events-az can also integrate with Pacemaker clusters.

To ensure high availability and service continuity in your Azure VMs, you should configure the azure-events-az resource agent within your Pacemaker clusters. This agent monitors for scheduled Azure maintenance events and can proactively relocate resources for a graceful node shutdown. Configure the agent to monitor specific event types such as Reboot and Redeploy, and enable verbose logging for detailed diagnostics.

In addition, it is also important that you define a procedure on how to react to scheduled events.

Resources

Resource Graph Query

// under-development



SAP-42 - ASCS-Pacemaker (Central Server Instance) Ensure Pacemaker cluster has been setup for SAP ASCS high availability

Category: Availability

Impact: High

Guidance

For the ASCS-Pacemaker (Central Server Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP ASCS high availability.

Resources

Resource Graph Query

// under-development



SAP-45 - ASCS-LB (Central Server Instance) Ensure the load balancer is configured correctly for SAP ASCS High availability

Category: Availability

Impact: High

Guidance

For the ASCS-LB (Central Server Instance), ensure that the load balancer is configured correctly for SAP ASCS high availability.

Resources

Resource Graph Query

// under-development



SAP-46 - DBHANA-Pacemaker (Database Instance) Ensure the Pacemaker cluster has been setup for SAP HANA DB high availability

Category: Availability

Impact: High

Guidance

For the DBHANA-Pacemaker (Database Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP HANA DB high availability.

Resources

Resource Graph Query

// under-development



SAP-49 - DBHANA-LB (Database Instance) Ensure the load balancer is configured correctly for SAP HANA DB High availability

Category: Availability

Impact: High

Guidance

For the DBHANA-LB (Database Instance), make sure the load balancer is configured correctly for SAP HANA DB high availability.

Resources

Resource Graph Query

// under-development