Blameless Post-Incident Review Process
At the Ministry of Justice, we recognise that incidents happen. Rather than assigning blame, we focus on understanding the root causes and identifying opportunities to improve. This process provides a structured way to capture insights and drive change.
Step 1: Schedule a Review Meeting
Within 72 hours of the incident’s resolution, the incident lead (the individual who identified or resolved the incident) should schedule a blameless post-incident review meeting. Invite stakeholders and Operations Engineering team members. It is vital to involve everyone involved in the incident – from detection to resolution.
Step 2: Document the Incident Summary
Before the meeting, the incident lead should create a draft document containing the following:
- Date and Time: When the incident occurred and its duration.
- Description: A concise explanation of the incident.
- Impact: The ramifications for users, systems, and the business.
Step 3: Construct a Timeline of Events
Detail the incident’s timeline, from the time it was detected to its resolution. Highlight significant milestones, decisions, and actions taken.
Step 4: Discuss and Identify Root Causes
During the review meeting, collaboratively discuss and document:
- Technical causes: Failures in systems, configurations, etc.
- Process causes: Gaps in procedures or protocols.
- Human causes: Decisions or actions taken.
Remember, this is blameless. Focus on system and process improvements, not individual actions.
Step 5: Record Findings and Learnings
Document:
- What went well.
- Areas of improvement.
Step 6: Formulate Recommendations
Propose actionable improvements for each identified area. Assign ownership and set a deadline for resolution.
Step 7: Share the Incident Summary
We encourage you to share any documentation through relevant channels. This might be a link in Slack to the post-incident review or a demonstration of the changes made to mitigate any future incidents.
Step 8: Document the Review
Retain a detailed record of the meeting and the insights garnered. Update the Incident Log. This aids future training and helps monitor the implementation of recommendations.
Step 9: Implement and Monitor Recommendations
Ensure there’s a feedback loop to track the progress of the suggested changes, ensuring accountability and continuous improvement.
An Example
An example of a post-incident review document can be found here.
Feedback
Should you have any recommendations or feedback regarding this process, kindly reach out via the #ask-operations-engineering Slack channel or create an issue in our operations-engineering repository.
Note: For the process to be effective, fostering an open and blameless culture is paramount. All team members should feel safe to share their perspectives and insights.