6 - Respond
The presented Microsoft Azure Well-Architected Framework recommendations in this guidance include Reliability Stage “6 - Respond (Responding to Failures)” and associated resources and their settings.
This final stage involves having plans and procedures in place to react to incidents affecting reliability. This includes automated failovers, backup restoration, and escalation protocols for manual intervention.
Summary of Recommendations
Recommendation | Category | Impact | State | ARG Query Available |
---|---|---|---|---|
WARD-1 - Implement proactive Incident Response | Disaster Recovery | High | Verified | No |
Recommendations Details
WARD-1 - Implement proactive Incident Response
Category: Disaster Recovery
Impact: High
Recommendation/Guidance
Prevention of all problems is a laudable, but impossible goal. Things will go wrong, so we need a plan to limit the impact on our end users and return operations to normal as quickly as possible.
The key is to respond with urgency, rather than react. A reaction tends to be more impulsive and based in the present moment, without consideration of long-term effects. A response is well-thought-out, organized, and information based.
Your incident response approach determines your effectiveness at:
Understanding what’s going on (diagnosing the problem) Triaging (determining the urgency) and prioritizing the problem Engaging the right resources to mitigate the issue(s), and Communicating with stakeholders about the problem After the problem has been remediated, you can then learn from the incident through a post-incident review process. That’s an important subject which has a whole separate module worth of discussion.
Resources