Well done Andrew, Jimi and Paul

Yesterday was rather busy - starting with a power outage at the data centre the night before.

Whilst the outage was under 4 minutes, and equipment should all start up and carry on after an outage, it seems four servers did not. This makes me wonder if it was a power spike, but I have not seen the report from Pulsant yet.

Most things did start working, as you would expect, and our routers recover within a couple of seconds of power being restored. One server that failed was a spam filter, so just meant we had less capacity for spam screening. Another was one of the outgoing mail server pool, and again we can work with one being off line. Finally one of the disk servers (with all my stuff on it, as well as customer voicemail recordings) was down. One server handled usage recording for calls and broadband, so people got some for free!

Jimi and Paul worked on these issues over night, and Paul and Andrew were working on them during the day. The mail server queue was cleared. The disk server fixed. I was able to ensure the billing system was working correctly once the usage recording database was back. It was a busy day all around.

The day did not end there as something blipped last night in the other data centre with two routers having trouble with a switch. Thanks to Paul and Jimi for work on that and fixing it within a few minutes. Again, most systems re-routed automatically, as they should.

Well done everyone.

No comments:

Post a Comment

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.


There are lots of ways to debug stuff, but at the end of the day it is all a bit of a detective story. Looking for clues, testing an hypothe...