Operations Engineering Communications Plan
This plan details to the who, how and when we will communicate information to users and stakeholders.
The Plan
The table below sets out the typical types of communications the Cloud Platform will want to issue.
What information | Target Audience | When | Channel |
---|---|---|---|
New features or services | Service Teams | As features are shipped | operations-engineering-update |
New GitHub features or services | Service Teams | As features are shipped | github-community |
Things we have learned | Justice Digital & Technology | Regularly when we have key things to show or promote | Show the Thing |
Service impacting incidents | Service Teams | When an incident has been declared, key updates during resolution, and at incident closure | operations-engineering-update |
Sharing postmortems for service impacting incidents | Service Teams | When postmortem has been documented | operations-engineering-update |
Service impacting upgrades/Maintenance | Service Teams | As required | operations-engineering-update |
Sharing successes | Chief Technology Officer | Weekly | One to ones and team meetings |
Sharing successes | SMT | Fortnightly | Architecture & Platforms weeknotes |
Sharing successes | Justice Digital & Technology | Regularly when we have key things to show or promote | Post about them in #chat on Slack/Consider for a Show the Thing |
Tips on format of communications
The #operations-engineering-update channel is used for a number of different types of communications so it is important that messages posted have a clear description so that users can understand the importance of the message. In the case of incidents or upgrades the message should have a bold title that clearly describes what the message will be about.
Examples
Kubernetes 1.14 Upgrade
Incident - Sentry unavailable
Action Required - Decommissioning Service
Things to include in incident communications
- service impact/what users might be reporting
- action being taken (this might just be that we are investigating the issue)
- when users can expect a progress update (and make sure that the update actually happens when you say it will)
- details of any actions users need to take (if applicable)
- apologise for the inconvenience (it might not be an issue in our control but this can build trust and let users know that you are taking resolution seriously)
Example
High Priority Incident Declared - Cloud Platform We are aware that some users are experiencing issues with the access to services on the Cloud Platform this morning. We are unsure of the full impact of these issues and the extent to which this is impacting services. An incident team has been formed and the team are investigating. We will provide further updates in due course, but in any event the next update will be in 30 minutes. Thank you for your patience.
Things to include in upgrade communications
- What you are upgrading
- When the upgrade will take place (including times)
- Why you are upgrading (in might be useful to include a link to some change notes or something that users can refer to if they want more information rather than adding it all in the comms.)
- Details of any service impact or anything users might need to take action on as a result of changes (including if we are pausing pipelines)
- Details of any risks posed to services as a result of not taking requested actions
- Include examples of code if it helps users understand the changes
- Provide a high level overview of the process we will be taking to implement changes
- Refer users back to the #ask-cloud-platform channel if they have questions or something isn’t working as expected
Example
Certificate Manager Upgrade > When: 23rd April starts at 12:00 PM. What: Cert-Manager upgrade to v0.14 from v0.8 Description: We are upgrading cert-manager, which manages all of the SSL certificates for > your websites and web applications, from version 0.8 to 0.14.
This is a significant change. Version 0.14 removes the “v1alpha1” designation for > certificates, and changes the API group from certmanager.k82.io to > > > > cert-manager.io.
What you need to do: Move all of your certificate definitions into your namespace folders in the environments repository, by the end of 22nd April 2020.
Certificates in the old format will not work after the upgrade.
The Cloud Platform team will make all the required changes to any certificate definitions in the environments repository.
If you have certificate definitions which are not in the environments repository, you will need to make the required changes to the certificate yaml files, and redeploy the certificates once the upgrade is complete.
Downtime: We are not expecting any downtime as a result of this change.
Process: (overview of steps that need to be taken my users and the the Cloud Platform Team)
If you experience any issues during or after the changes implemented please contact the team over at #ask-cloud-platform
Sharing information with the wider Ministry of Justice and the Public
There may be occasions where we want to publish something to the wider MoJ or the Public. In these instances we can publish something on the MoJ Digital blog. Requests to publish on the blog should be made to #ask-comms in the first instance.