Aug 10, 2020
Monday 10th August 2020 – 13:38pm – the SOBS servers are currently offline.
2pm: The outage is currently being investigated by our hosting company. Currently they are suspecting a networking problem. This is impacting many sites besides ourselves. They are anticipating it will be corrected shortly.
2:48pm: The server appears to be back. However when I checked the logs it appears as though it was running the whole time and the network outage was only affecting some customers. I called the hosting company and they indicated they are not our of the woods yet – there are still some issues they are resolving – they will be giving a full update later.
3:55pm: The hosting company just rang to confirm that all of these problems have now been resolved. A resolution will follow shortly.
The hosting service has completed a post incident report that indicates one of the core network switches failed. The details of the report are included below:
Date/Time of Incident(s) 10 August 2020 – 12:22PM EST
Date/Time of Restoration 10 August 2020 – 14:39PM EST
Impact: Outage to Voice, Internet, WAN and IaaS services for all customers. Impact classified as severe
Level 1, immediate response.
As per standard Level 1 protocols, 4 Engineers were immediately dispatched to 2 data centres for physical investigation of the infrastructure, onsite at 12.45PM, whilst the Technical Lead for issue resolution worked remotely on diagnosis and resolution. Investigation into the cause of the issue was conducted in accordance with our internal troubleshooting procedures.
Engagement with the switching vendor commenced at 12:45PM after a hard re-boot of the core switches failed to resolve the issue, at which point the fault was identified and isolated to software code on a core network device, which was impacting the entirety of the core network stack. As a precaution replacement network hardware was dispatched to the data centre and the
affected device was replaced.