Friday, 21 July 2017

warning: comparison between signed and unsigned integer expressions

This is one of the stupidities in the C language and it bugs me because it would be so simple for C to just code it correctly. I'd really like a gcc option to do this!

When you store whole numbers in binary you usually have a choice of signed or unsigned. The signed version allows negative numbers but at the cost of the range of positive values possible.

For example a signed char allows values -128 to +127, but an unsigned char allows values 0 to 255.

If you compare them, using ==, !=, >, or < for example, the operation converts the signed value to an unsigned value and then compares.

Example

signed int a = -1;
unsigned int b = 1;  
if (a > b)
   printf ("a>b\n");
if (b > a)
   printf ("b>a\n");

This print a>b even though a is -1 and b is 1!

This is because -1, converted to an unsigned value, is a big number, in fact the biggest an unsigned int can be.

What pisses me off is that, even when C was invented, the code to make the comparison work would have been one check of one bit extra. Basically, whatever the comparison, you just have to check the signed value is negative or not before making the comparison. If it is negative that means it is not equal to the unsigned value, and is smaller than the unsigned value, so whatever comparison you were doing is decided by the signed value being negative before going on to do the comparison as normal.

To me this would have been a far more logical behaviour than changing the value of the signed variable by making it unsigned.

18 comments:

  1. I simply couldn't agree more. And I think it is not too late to fix it as well. In C we could use a pragma which should be standardised. I would also love it if there was an option so that expressions didn't simply do the wrong thing in cases where dealing with intermediate overflow - it would not be too hard - examples, such as (a + b ) / 2, a * b / c and (a * b + c ) / d. In the D language (dlang.org), my new love, the designers might have more freedom of action to simply fix this. In D, keeping compatibility with C expressions was a goal, but maybe it was taken too literally.

    ReplyDelete
  2. In fact what about this case, s1 < s2 + u, where s1, s2 are signed, u is unsigned, and in want an signed comparison because the s's are x-coordinates and u is a width. A real case that is a nightmare at code review time in graphics code.

    ReplyDelete
  3. What can I say? It's one of the many idiosyncrasies of C. It’s been such a long time now since I used C that I’ve probably forgotten all the other traps but the pain lingers on.

    These days, to do bare metal (what little I do these days), I use Rust (https://www.rust-lang.org/en-US/) , it’s as efficient as C (mostly) and a much saner language and environment. Takes some getting used to though! So I’m not advocating it, just making you aware in case you weren’t already.

    ReplyDelete
  4. When C was invented there were a lot more insane systems out there than that. Two's-complement arithmetic is *not* the only type permitted by the Standard, and some of the early systems were much odder. So when C was invented they'd never have considered doing it that way.

    These days, of course... well, you'd be turning one conditional into two in a very common case. This is not at all good for performance.

    (Also, the type promotion rules of C may be horrible and crazy but at least they're consistent: though the signed-and-unsigned interaction is somewhat arbitrary, it makes sense if you consider that the purpose of type promotion is to avoid losing information and that positive numbers are much more commonly used than negative ones. Adding holes in it for *single operators alone* -- because you might make this work for comparison but you'll never make it work for arithmetic -- seems horrible to me. Also, of course, totally inconsistent with the body of existing code...)

    ReplyDelete
  5. You are assuming that signed integers are held in twos complement representation in the hardware. The C language does not mandate this. So you're statement "all it has to do is test one bit to see if negative" is untrue, the way to do that test is implementation specific.

    What C does is take the signed number binary representation, use that as if it were unsigned, and do the comparison. In twos complement -1 is a very big unsigned number, but in other signed number representations it could be almost anything. You can't assume -1 is 0xFFFFFFFF (or however many bits there are in the ints on your platform).

    The reason these things are implement specific is because a language like C cannot dictate to the hardware how it should hold negative numbers.

    ReplyDelete
    Replies
    1. Well, C could have said that signed/unsigned comparisons work, and exactly how is implementation specific.

      Delete
  6. Given that the way C does comparisons it allows it to be (on typical architectures) one compare instruction and one branch, it's hard to see how any other kind of 'do another test first to decide which comparison to use' scheme could be much better than twice as large/slow.

    And C never chooses slower/safer/more convenient over fast/simple.

    ReplyDelete
    Replies
    1. OK, even if optional, and a warning, it is better than the broken behaviour that confuses the hell out of programmers. Also, things like ARMs can do a test and skip in like one cycle - heck, if C did this you would have instructions in processors to do this in one go now!

      Delete
    2. As you say - if much of the world shared your view on this then there'd be various 'mixed sign' jump instructions in processors, and presumably some additional CPU state flags to support those operations.

      Even after that, arithmetic in fixed-length words would still be full of divergences from proper real-world maths though (and particularly so when you mix signed and unsigned types) and programmers would have to cope with them.

      As a contrast, C# does mixed signed/unsigned comparisons by promoting the unsigned value to a (longer) signed type to allow the comparison to be signed. (So you can't do mixed compares if the unsigned type is already the longest integer type).

      Delete
  7. Well gcc *will* tell you if you ask it nicely enough: -Wsign-compare (annoyingly -Wall doesn't switch it on)

    ReplyDelete
    Replies
    1. Indeed, hence the title of this blog post :-)

      Delete
  8. Actually it's even worse than in your description: if, in the example code you give that outputs "a>b", you change the two variables to be "signed char" and "unsigned char" (in place of the corresponding "int" types), then you get the opposite result, "b<a".

    This is in accordance with the C standard, but it's really not easy to get one's head round, because it depends on the relative sizes - and if you have typedefs involved, you may not even know whether a particular value is signed or unsigned, smaller than int or larger than int etc.

    Just waiting for someone to say "this is why <insert-name-of-favourite-language> handles signed/unsigned integers differently" :-)

    ReplyDelete
    Replies
    1. Indeed, I nearly put that in as well - very odd.
      Cliff tells me Ada is the answer :-)

      Delete
    2. > Cliff tells me Ada is the answer :-)

      The cool kids are telling me Go is the answer. Wait, no, that was last week, now it's Rust :-)

      Delete
    3. Nah, Go ruled itself out by requiring a GC.

      The cool kids said D, then C++11, and are now on Rust.

      Delete
  9. The most recent language I've learned is 8051 assembler. It has loads of problems, but not this one.

    ReplyDelete
  10. That is an excellent tip from Pete - about -Wsign-compare. I'm hoping that isn't the only compiler. GCC should be persuaded to make it the default and to fix the crazy crazy bug with it being omitted from -Wall. Is that possible? I'm just hoping the two excellent D compilers “GDC” (GCC) and the LLVM-based “LDC” can offer that check. Anyone know if Clang / LLVM C compilers offer the good thing too?

    ReplyDelete
  11. Someone has been looking at this in depth, seriously, with a genuine will to improve things - https://issues.dlang.org/show_bug.cgi?id=259

    ReplyDelete