Thursday, 26 April 2012

Bug hunting


OK, enbugging code is the process of adding bugs to it.

Sadly it is almost impossible to do "coding" without also "enbugging".

I am chasing a couple of annoying sods right now. As ever, they are "impossible" until viewed in hindsight. If only I could have hindsight, in advance :-)

The VoIP code has an issue where, every few days, in the office, the support hunt group is breaking. We can see it happen and reset the call server in the office when it does, but I cannot see any way for this to occur when reading the code.

I am adding more and more debug logging to the code - I will find it.

But bugs you cannot reproduce on demand are the worst, and the ones that take days or weeks to "just happen" are even worse. The whole cycle of finding likely causes and adding more debug information takes weeks if you are not careful.

Well, all I can do is wait and see.

Oh, and if you wonder what is worse than that - it is compiler bugs. They are very very very rare, but I have hit a couple in my many years coding and they are the worst type of bugs you can imagine. Trust the compiler. Use the source, Luke...


  1. The worst of the worst for me are the bugs that are seemingly fixed by adding debug code, then you reverse off the debugging and it works still. WTF!

    1. I've had a compiler bug like that - the linker placed modules in memory in last-built order. The compiler generated crap code when initializing far function pointers (Infineon C165 16-bit MCU, with a 24-bit address space split into 256 segments), where it always set the segment on a statically initialized far function pointer to 0. So, depending on whether the module *using* the function pointers was in segment 0, or not.

      For added fun, we had around 67KB of code when the bug started to manifest - so if we were really lucky, the code in segment 1 wasn't a target for a function pointer, and it all Just Worked. Rebuild one module (causing the linker to rearrange code), and the last 3KB was a target for a function pointer causing random behaviour.

  2. I call enbugging 'job security' :p

    My favourite kind of bug is where someone reports something trivial, and you suddenly realize the reason it's wrong is because the entire design of a subsystem is BS & have to spend a week fixing it... Bonus points if this is after confidently stating that it was a trivial bug and would be fixed in half an hour.

  3. The worst bugs I've seen are when you've had a product that's been out and working for months, then a customer reports a bug, and you test it and wonder how did this ever work at all? ... and then from that point on it doesn't...

    I know this makes no sense but I'm sure it's happened to me several times.

    1. Oh yes, the classic "no way this could ever have worked" bugs. They are fun too.

    2. There's even a term for it: a schrödinbug.

  4. I've encountered a bug in PHP which was resolved by adding a comment just before the line encountered. Remove the comment (which does nothing) and segfault... That was "fun" to find and "fix".

  5. My favourite is what I call the "mixed logic" bug, which happens when you have more than one way to decide something. For example, a field (say Type) has values of "A" or "B" and different logic is needed for each. There are *lots* of ways to do this, for example:
    The obvious:
    IF Type = "A" do A-code
    ELSE do B-code
    - and most programmers will do this (40 years experience has taught me this!)
    An alternative may be:
    IF Type = B do B-code
    ELSE do A-Code
    (there may be no reason to choose one or the other above)
    Purist might do:
    IF Type = "A" do A-code
    ELSE IF Type = "B" do B-code
    ELSE do Report-Type-Error
    but that's pretty unusual among programmers who don't have OCD :-)
    But the fun comes when some nerk (hereinafter called The User) says: "We need a Type C, but it's handled the same way as B".
    If different types of conditional code are used in the program, there's a chance that only the 1st one will be found when it's seen to be OK as it is, resulting in:
    IF Type = "A" do A-code
    ELSE do B-code
    in one part of the program, and:
    IF Type = "B" do B-code
    ELSE do A-code
    remaining in another, causing a bug. I've seen this sort of thing a remarkable number of times, especially when the code isn't monolithic - searching the top module may find the code that works as it is, so submodules won't be looked at because it was OK in the top one.

    If I had a pound for every time I've seen something like this happen, I wouldn't be broke now! :-)

  6. My favourite bug was in an embedded system in the days when we still used eproms. One of the eproms had been through too program/erase cycles and would lose a few bits after about 10 hours.

    Result - a system that worked fine at the end of the day, crashed overnight, and would crash immediately when restarted the following morning.