RevK®'s ramblings: warning: comparison between signed and unsigned integer expressions

2017-07-21

warning: comparison between signed and unsigned integer expressions

This is one of the stupidities in the C language and it bugs me because it would be so simple for C to just code it correctly. I'd really like a gcc option to do this!

When you store whole numbers in binary you usually have a choice of signed or unsigned. The signed version allows negative numbers but at the cost of the range of positive values possible.

For example a signed char allows values -128 to +127, but an unsigned char allows values 0 to 255.

If you compare them, using ==, !=, >, or < for example, the operation converts the signed value to an unsigned value and then compares.

Example

signed int a = -1;

unsigned int b = 1;  

if (a > b)

   printf ("a>b\n");

if (b > a)

   printf ("b>a\n");

This print a>b even though a is -1 and b is 1!

This is because -1, converted to an unsigned value, is a big number, in fact the biggest an unsigned int can be.

What pisses me off is that, even when C was invented, the code to make the comparison work would have been one check of one bit extra. Basically, whatever the comparison, you just have to check the signed value is negative or not before making the comparison. If it is negative that means it is not equal to the unsigned value, and is smaller than the unsigned value, so whatever comparison you were doing is decided by the signed value being negative before going on to do the comparison as normal.

To me this would have been a far more logical behaviour than changing the value of the signed variable by making it unsigned.

18 comments:

Cecil WardFriday, 21 July 2017 at 11:03:00 BST
I simply couldn't agree more. And I think it is not too late to fix it as well. In C we could use a pragma which should be standardised. I would also love it if there was an option so that expressions didn't simply do the wrong thing in cases where dealing with intermediate overflow - it would not be too hard - examples, such as (a + b ) / 2, a * b / c and (a * b + c ) / d. In the D language (dlang.org), my new love, the designers might have more freedom of action to simply fix this. In D, keeping compatibility with C expressions was a goal, but maybe it was taken too literally.
ReplyDelete
Replies
Cecil WardFriday, 21 July 2017 at 11:11:00 BST
In fact what about this case, s1 < s2 + u, where s1, s2 are signed, u is unsigned, and in want an signed comparison because the s's are x-coordinates and u is a width. A real case that is a nightmare at code review time in graphics code.
ReplyDelete
Replies
ContextSwitchFriday, 21 July 2017 at 11:56:00 BST
What can I say? It's one of the many idiosyncrasies of C. It’s been such a long time now since I used C that I’ve probably forgotten all the other traps but the pain lingers on.

These days, to do bare metal (what little I do these days), I use Rust (https://www.rust-lang.org/en-US/) , it’s as efficient as C (mostly) and a much saner language and environment. Takes some getting used to though! So I’m not advocating it, just making you aware in case you weren’t already.
ReplyDelete
Replies
Nick AlcockFriday, 21 July 2017 at 12:03:00 BST
When C was invented there were a lot more insane systems out there than that. Two's-complement arithmetic is *not* the only type permitted by the Standard, and some of the early systems were much odder. So when C was invented they'd never have considered doing it that way.

These days, of course... well, you'd be turning one conditional into two in a very common case. This is not at all good for performance.

(Also, the type promotion rules of C may be horrible and crazy but at least they're consistent: though the signed-and-unsigned interaction is somewhat arbitrary, it makes sense if you consider that the purpose of type promotion is to avoid losing information and that positive numbers are much more commonly used than negative ones. Adding holes in it for *single operators alone* -- because you might make this work for comparison but you'll never make it work for arithmetic -- seems horrible to me. Also, of course, totally inconsistent with the body of existing code...)
ReplyDelete
Replies
Owen SmithFriday, 21 July 2017 at 12:55:00 BST
You are assuming that signed integers are held in twos complement representation in the hardware. The C language does not mandate this. So you're statement "all it has to do is test one bit to see if negative" is untrue, the way to do that test is implementation specific.

What C does is take the signed number binary representation, use that as if it were unsigned, and do the comparison. In twos complement -1 is a very big unsigned number, but in other signed number representations it could be almost anything. You can't assume -1 is 0xFFFFFFFF (or however many bits there are in the ints on your platform).

The reason these things are implement specific is because a language like C cannot dictate to the hardware how it should hold negative numbers.
ReplyDelete
Replies
Will DeanFriday, 21 July 2017 at 14:03:00 BST
Given that the way C does comparisons it allows it to be (on typical architectures) one compare instruction and one branch, it's hard to see how any other kind of 'do another test first to decide which comparison to use' scheme could be much better than twice as large/slow.

And C never chooses slower/safer/more convenient over fast/simple.

ReplyDelete
Replies
Pete FavelleFriday, 21 July 2017 at 15:45:00 BST
Well gcc *will* tell you if you ask it nicely enough: -Wsign-compare (annoyingly -Wall doesn't switch it on)
ReplyDelete
Replies
AnonymousFriday, 21 July 2017 at 16:40:00 BST
Actually it's even worse than in your description: if, in the example code you give that outputs "a>b", you change the two variables to be "signed char" and "unsigned char" (in place of the corresponding "int" types), then you get the opposite result, "b<a".

This is in accordance with the C standard, but it's really not easy to get one's head round, because it depends on the relative sizes - and if you have typedefs involved, you may not even know whether a particular value is signed or unsigned, smaller than int or larger than int etc.

Just waiting for someone to say "this is why <insert-name-of-favourite-language> handles signed/unsigned integers differently" :-)
ReplyDelete
Replies
Owen SmithSaturday, 22 July 2017 at 22:18:00 BST
The most recent language I've learned is 8051 assembler. It has loads of problems, but not this one.
ReplyDelete
Replies
Cecil WardThursday, 27 July 2017 at 19:15:00 BST
That is an excellent tip from Pete - about -Wsign-compare. I'm hoping that isn't the only compiler. GCC should be persuaded to make it the default and to fix the crazy crazy bug with it being omitted from -Wall. Is that possible? I'm just hoping the two excellent D compilers “GDC” (GCC) and the LLVM-based “LDC” can offer that check. Anyone know if Clang / LLVM C compilers offer the good thing too?
ReplyDelete
Replies
Cecil WardThursday, 27 July 2017 at 19:51:00 BST
Someone has been looking at this in depth, seriously, with a genuine will to improve things - https://issues.dlang.org/show_bug.cgi?id=259
ReplyDelete
Replies