Had an interesting complaint today - well more of a query - apparently we provided a customer with more than 100% uptime in October!
Well, you can guess what it was, I hope. It was 28th October. It has 25 hours in the day. Customers were seeing 104% uptime for October 28th.
It struck me as odd - as I am not usually that foolish, and I coded this.
This is where it gets interesting though. I checked my code. We log uptime in seconds, and we work out percentages based on local time. That means we were working on a 90,000 seconds day for 28th October. So we allowed for the extra hour in the percentage calculation... I thought I was not that foolish. So how do we see 104% uptime?
Looking in to it more, we are seeing the uptime for the day recorded at 93,600. That is 26 hours in a 25 hour day...
I believe I have eventually tracked it down though.
Each hour we get RADIUS accounting. We record the change since the last hour. The "last hour" record, well, "last update", is stored in an sql database. It includes the tx and rx, byte and packet, counts and the time of the last update.
When we process the new update we can see the time and usage has moved on, and we update the stats. The uptime normally adding 3600.
The catch is that the time is recorded in local time.
At 00:00:00BST we record stats and save the time as 00:00:00.
At 01:00:00BST we see a last time as 00:00:00 and assume BST, so record 3600 seconds, and record time as 01:00:00
At 01:00:00UTC we see a last time as 01:00:00 and assume BST, so record 3600 seconds, and record time as 01:00:00
At 02:00:00UTC we see a last time as 01:00:00 and assume BST, so record 7200 seconds.
At 03:00:00UTC we see a last time as 02:00:00 and assume UTC, so record 3600 seconds.
If a customer logs out and back in between 1am and 1am then we amplify this, recording yet another hour. We had one customer with 107% uptime for that day!
The fix is simple - those fields are now assumed to be UTC. Lets hope sql has no issue with March where a time of 02:00:00 is not valid.
Just goes to show, you can give "more than 100%" :-)
P.S. Before you look, I am fixing the erroneous stats.