Timeline
- 2020/12/09 - 2:00PM UTC+1: Two servers were added following the internal runbook, missing firewall rules triggered errors on only these 2 servers;
- 2020/12/09 - 6:23PM UTC+1: 1st notification of the issue (external testing);
- 2020/12/09 - 7:30PM UTC+1: Add iptables rules. Issue resolved
Severity
The severity of the incident has been classified as MINOR which implies a low impact for our users.
Root cause analysis
- Two new web servers were added without them being listed in the iptables (firewall) rules in memcache servers.
- All queries to memcached failed on these two servers. All other servers were up and running that mitigated a lot of the issue and avoided any major impact.
- Concerning oauth2 access token: there was no impact since a retry would fix this, no token has been lost.
Remediation plans
In order to prevent similar circumstances going forward the following actions will be put in place.
| Preventative Action | Owner | Due date |
| --- | --- | --- |
| Improve Ansible procedure | Head of Cloud Engineering | Already done on 2020/12/10 |