Skip to main content

Blameless Post-Incident Review Process

At the Ministry of Justice, we recognise that incidents happen. Rather than assigning blame, we focus on understanding the root causes and identifying opportunities to improve. This process provides a structured way to capture insights and drive change.

Step 1: Schedule a Review Meeting

Within 72 hours of the incident’s resolution, the incident lead (the individual who identified or resolved the incident) should schedule a blameless post-incident review meeting. Invite stakeholders and Operations Engineering team members. It is vital to involve everyone involved in the incident – from detection to resolution.

Step 2: Document the Incident Summary

Before the meeting, the incident lead should create a draft document containing the following:

  • Date and Time: When the incident occurred and its duration.
  • Description: A concise explanation of the incident.
  • Impact: The ramifications for users, systems, and the business.

Step 3: Construct a Timeline of Events

Detail the incident’s timeline, from the time it was detected to its resolution. Highlight significant milestones, decisions, and actions taken.

Step 4: Discuss and Identify Root Causes

During the review meeting, collaboratively discuss and document:

  • Technical causes: Failures in systems, configurations, etc.
  • Process causes: Gaps in procedures or protocols.
  • Human causes: Decisions or actions taken.

Remember, this is blameless. Focus on system and process improvements, not individual actions.

Step 5: Record Findings and Learnings

Document:

  • What went well.
  • Areas of improvement.

Step 6: Formulate Recommendations

Propose actionable improvements for each identified area. Assign ownership and set a deadline for resolution.

Step 7: Share the Incident Summary

We encourage you to share any documentation through relevant channels. This might be a link in Slack to the post-incident review or a demonstration of the changes made to mitigate any future incidents.

Step 8: Document the Review

Retain a detailed record of the meeting and the insights garnered. Update the Incident Log. This aids future training and helps monitor the implementation of recommendations.

Step 9: Implement and Monitor Recommendations

Ensure there’s a feedback loop to track the progress of the suggested changes, ensuring accountability and continuous improvement.

An Example

An example of a post-incident review document can be found here.

Feedback

Should you have any recommendations or feedback regarding this process, kindly reach out via the #ask-operations-engineering Slack channel or create an issue in our operations-engineering repository.


Note: For the process to be effective, fostering an open and blameless culture is paramount. All team members should feel safe to share their perspectives and insights.

This page was last reviewed on 21 May 2024. It needs to be reviewed again on 21 November 2024 by the page owner #operations-engineering-alerts .