Ensure that each SAP production system is designed for high availability across availability zones
Impact:HighCategory:High Availability
APRL GUID:a9b649a5-2bfe-40ca-9b8f-34f9c71dfa12
Description:
Azure Availability Zones are physically separate locations within each Azure region that are tolerant to local failures. Use availability zones to protect your applications and data against unlikely data center failures. Ensure each single point of failure of each SAP production system is protected with high availability using multiple availability zones. If you cannot deploy across different zones in a region, then refer to Microsoft guidance for High availability deployment options for SAP workload.
Click the Azure Resource Graph tab to view the query
//under-development
Run SAP application servers on two or more VMs using VMSS Flex
Impact:HighCategory:High Availability
APRL GUID:49bd34ab-d117-4b0e-99f8-34cc8a5394bc
Description:
Use Virtual Machines Scale Set (VMSS) with flexible orchestration to distribute the virtual machines across specified zones and within each zone to also distribute VMs across different fault domains within the zone on a best effort basis. Configure VMSS Flex following Microsoft recommendation for SAP workload using the right mode and correct settings. If you aren't currently using VMSS Flex for SAP application servers and also not using Availability Sets with Fault domain and Update domain distribution, then you should consider moving to VMSS Flex architecture to improve the resiliency posture of your SAP deployment. The following blog post in links below outlines the details on the process of migrating existing SAP workloads that are deployed in an availability set or availability zone to a flexible scale set with FD=1 deployment option.
Ensure synchronous data replication (SYNC mode) between primary and secondary VM nodes
Impact:HighCategory:High Availability
APRL GUID:094400a5-f112-408d-a334-afd68873ff0f
Description:
High availability for databases should be implemented using database native replication technologies and the data should be replicated synchronously that is in SYNC mode from primary database to a stand-by node.
Click the Azure Resource Graph tab to view the query
//under-development
Design SAP shared file systems for high availability, utilizing availability zones when possible
Impact:HighCategory:High Availability
APRL GUID:e09ca960-20b7-4831-b85b-83ec84c1390e
Description:
SAP shared file systems such as /sapmnt, /usr/trans, interfaces should be made highly available.
In case of Azure File Shares, we recommend that you use ZRS (Zone-redundant storage) and for Azure NetApp Files use Zonal replication for your volumes.
Click the Azure Resource Graph tab to view the query
//under-development
Test high availability solutions thoroughly to ensure fail overs work as expected
Impact:HighCategory:High Availability
APRL GUID:5663a808-56be-49ea-8d5c-c5dfc6925f76
Description:
Test all high availability solutions thoroughly (including kernel panic in Linux VMs and also fail-back). Include zonal failure scenarios in your testing, the testing should confirm that each layer of your SAP solution including database, central services, application servers and shared file systems is configured correctly for zone redundancy, the solution meets RPO = 0 and the application fails over automatically meeting your RTO.
The fail back can be either automatic or manual.
Click the Azure Resource Graph tab to view the query
//under-development
Remove unwanted location constraints from Linux Pacemaker clusters
Impact:HighCategory:High Availability
APRL GUID:1b8a3051-dfd4-4780-bfb7-446296774029
Description:
When executing a migrate command in a Linux Pacemaker cluster, the system generates a temporary "prefer" location constraint, aiming to move a resource to a specified node. This constraint prioritizes the target node for the resource temporarily without permanently altering the cluster's configuration.
During planned maintenances and fail over testing, you can leverage the migrate command for temporary resource relocation during maintenance or administrative tasks to ensure minimal disruption. This constraint is not permanent and does not survive reboots or cluster resets. It's designed for short-term adjustments.
Once the planned task necessitating the resource migration is complete, manually remove the temporary constraint to revert to the cluster's original resource management policies.
This approach allows for controlled resource movement within the cluster, facilitating maintenance while preserving the integrity and efficiency of the cluster's configuration.
Click the Azure Resource Graph tab to view the query
//under-development
Secure compute resource capacity for critical VM roles in DR region
Impact:MediumCategory:Disaster Recovery
APRL GUID:820b4c0c-8a74-442a-8ba7-b0cb840cd983
Description:
To ensure the availability of compute resources for critical VM roles in a DR region, consider securing capacity either through a warm standby approach or by utilizing Azure's On-demand Capacity Reservation.
Warm standby involves keeping VMs in the DR region running. On-demand Capacity Reservation, on the other hand, reserves compute capacity without having to run the VMs, allowing you to start them when needed. When DR VMs are not needed, the reserved capacity may safely be used to run other workloads without the risk of losing the capacity to other customers. This strategy guarantees resource availability for your critical workloads in the event of a disaster, balancing cost and readiness.
Click the Azure Resource Graph tab to view the query
//under-development
SAP shared files systems are replicated or backed up to DR location
Impact:HighCategory:Disaster Recovery
APRL GUID:ee4dc309-00a1-49fe-92fa-1724baf5f103
Description:
Implementing robust monitoring and alerting for DR in SAP on Azure ensures coverage across its complex, multi-layer architecture. This strategy is crucial for databases, services, applications, and shared systems.
Click the Azure Resource Graph tab to view the query
//under-development
Document and test DR procedure ensure it meets RPO and RTO targets
Impact:MediumCategory:Disaster Recovery
APRL GUID:c300e949-528d-4ac9-889b-cacf8b4a6e90
Description:
Create detailed documentation of your DR procedures for each layer of the SAP architecture-database, central services, application servers, and shared file systems. This documentation should include configuration details, failover mechanisms, and step-by-step recovery procedures.
Test a wide range of failure scenarios, including regional outages. Testing should confirm that your DR strategy is robust, meets your RPO and RTO targets, and provides seamless failover across all layers of the SAP architecture. This will ensure a comprehensive and resilient DR strategy capable of withstanding regional failures and ensuring business continuity.
Click the Azure Resource Graph tab to view the query
//under-development
Ensure there is a robust monitoring and alerting solution in place for the entire DR solution
Impact:MediumCategory:Disaster Recovery
APRL GUID:c27134b7-6917-4852-8276-3dbef5c71578
Description:
For an SAP solution hosted on Azure it is imperative to implement a robust monitoring and alerting solution that comprehensively covers DR of each layer of the SAP architecture. Given the complexity of SAP systems, which span multiple layers using diverse technologies and Azure resources, each with potentially distinct DR replication mechanisms, an appropriate monitoring strategy is crucial. The different layers include database, central services, application, and shared file systems.
Click the Azure Resource Graph tab to view the query
//under-development
Configure scheduled events notification
Impact:HighCategory:Monitoring and Alerting
APRL GUID:6b589ce6-c847-4cee-af35-f6e8eb1cf983
Description:
Scheduled events is an Azure Metadata Services that provides proactive notifications about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. You should configure scheduled events for all your critical Azure VMs.
Resource agent azure-events-az can also integrate with Pacemaker clusters.
To ensure high availability and service continuity in your Azure VMs, you should configure the azure-events-az resource agent within your Pacemaker clusters. This agent monitors for scheduled Azure maintenance events and can proactively relocate resources for a graceful node shutdown. Configure the agent to monitor specific event types such as Reboot and Redeploy, and enable verbose logging for detailed diagnostics.
In addition, it is also important that you define a procedure on how to react to scheduled events.
Click the Azure Resource Graph tab to view the query
//under-development
Configure a Pacemaker cluster for SAP ASCS high availability
Impact:HighCategory:High Availability
APRL GUID:9d8f6678-694c-4da4-8384-415201f65194
Description:
For the ASCS-Pacemaker (Central Server Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP ASCS high availability.
Click the Azure Resource Graph tab to view the query
//under-development
Ensure the Pacemaker cluster has been setup for SAP HANA DB high availability
Impact:HighCategory:High Availability
APRL GUID:6648fe61-880d-4a96-8d2d-190a23d5580b
Description:
For the DBHANA-Pacemaker (Database Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP HANA database high availability.
Click the Azure Resource Graph tab to view the query
//under-development
Review SAP configuration for timeout values used with Azure NetApp Files
Impact:HighCategory:High Availability
APRL GUID:4884cada-b9c7-42d5-8153-3853e4a6f6c4
Description:
High availability of SAP while used with Azure NetApp Files relies on setting proper timeout values to prevent disruption to your application. Review the documentation to ensure your configuration meets the timeout values as noted in the documentation.
Potential Benefits:
Improve resiliency and performance of SAP on Azure
Click the Azure Resource Graph tab to view the query
//under-development
Provision recommended storage configuration on database VMs
Impact:HighCategory:Scalability
APRL GUID:697deb1d-d398-4989-9734-9e6c18f7e0ad
Description:
It is strongly recommended to review database storage configuration to ensure the right type and number of disks are used to provision the data and log volumes. This is to ensure that the database VMs meets performance requirements for IOPS and throughput for the given database.
You should also use Microsoft recommended settings such as disk caching, WriteAccelerator, stripe size and Linux I/O Scheduler mode for all database VMs.
SAP on Azure QualityCheck tool can help you identify any deviations from Microsoft recommendations quickly and at scale.
Potential Benefits:
Improve reliability, performance and optimize costs