RE:01 Design your workload to align with business objectives
Impact:MediumCategory:Other Best Practices
APRL GUID:8c0a0a4c-9e34-41af-9f6d-89d8dc00370e
Description:
Design your workload to align with business objectives and avoid unnecessary complexity or overhead. Use a practical and balanced approach to make design decisions that deliver the desired results. Contain your design to the necessities to reduce inefficiencies and potential problems.
RE:03 Use failure mode analysis to identify and prioritize potential failures
Impact:MediumCategory:Other Best Practices
APRL GUID:f5fbe3d4-7196-46b8-9b09-0e29e7cf43ac
Description:
Use failure mode analysis (FMA) to identify and prioritize potential failures in your solution components. Perform FMA to help you assess the risk and effect of each failure mode. Determine how the workload responds and recovers.
Define reliability and recovery targets for the components, the flows, and the overall solution. Use the defined targets to build the health model. The health model defines what healthy, degraded, and unhealthy states look like.
Potential Benefits:
Communicate reliability expectations with stakeholders
Add redundancy at different levels, especially for critical flows. Apply redundancy to the compute, data, network, and other infrastructure tiers in accordance with the identified reliability targets.
High availability is a foundational tenet of designing for reliability. A highly available architecture can help you avoid downtime as much as possible and recover efficiently if downtime does occur.
RE:05 Design for high availability with availability zones
Impact:MediumCategory:High Availability
APRL GUID:3d6adb0a-042f-47f7-a7ea-db2e360903d5
Description:
High availability is a foundational tenet of designing for reliability. A highly available architecture can help you avoid downtime as much as possible and recover efficiently if downtime does occur.
RE:07 Implement self-preservation and self-healing measures
Impact:MediumCategory:High Availability
APRL GUID:7b5008cf-1853-44c4-827d-bca091678c3f
Description:
Strengthen the resiliency and recoverability of your workload by implementing self-preservation and self-healing measures. Self-healing capabilities help you avoid downtime by building in failure detection and automatic corrective actions to respond to different failure types.
Build capabilities into the solution by using infrastructure-based reliability patterns and software-based design patterns to handle component failures and transient errors.
Test resiliency and availability scenarios by applying the principles of chaos engineering in your test and production environments. Use testing to ensure that your graceful degradation implementation and scaling strategies are effective by performing active malfunction and simulated load testing.
RE:09 Implement business continuity and disaster recovery plan
Impact:MediumCategory:Disaster Recovery
APRL GUID:5f95df03-cae2-4761-90b7-7afd657ac124
Description:
Implement structured, tested, and documented business continuity and disaster recovery (BCDR) plans that align with the recovery targets. Plans must cover all components and the system as a whole.
RE:10 Design a reliable monitoring and alerting strategy
Impact:MediumCategory:Monitoring and Alerting
APRL GUID:90adebf7-bc90-4939-9aa8-119c46bee0fc
Description:
Measure and publish the solution's health indicators. Continuously capture uptime and other reliability data from across the workload and also from individual components and key flows.