RevK®'s ramblings: C

Showing posts with label C. Show all posts

2020-08-27

Pseudo C++ using cpp (the RevK macro)

I am not sure if this is evil, or genius, or both. Either way I take full credit.

Mainly for a library where we want to add extra optional options in the future, so want almost C++ style optional and tagged arguments to a function, but in normal C (because C++ is just evil all by itself).

Not quite as flexible as C++, as no defaults for missing arguments, but you can often live with that knowing they are zero or NULL.

Unless someone can cite a prior art - please call this the RevK macro.

P.S. I have updated my string decimal library to use this, and it is way neater!

Some explanation...

Normally a function call in C has a fixed set of arguments. Well, not quite, some can have variable arguments at the end, like printf(...) but that is handled in a special way and you have to know (usually from a format string) how many arguments and which type.

However, in some languages, like C++, you can have optional arguments which are pre-defined types and names, but you can stop early in the list. In C++ you can say what default these missing arguments have. You also have the option to leave some arguments our and "tag" some others.

So, the idea you can call myfunc("hello",flag2:1) is setting the first argument (s), and the third (flag2) and not specifying the middle one (flag1) which ends up zeroed. I can't set defaults but can expect unspecified to be zeroed.

Now, normally, if I had some function you call as func(a,b,c) and I want some extra option later on, I would have to either change to func(a,b,c,d) everywhere it is used in every program, even if people specify d as 0 or NULL, or make a new separate function that takes the extra argument as an alternative, e.g. func2(a,b,c,d).

With this trick I am able to add this extra argument which is optional, knowing that if the extra argument is not specified it has a known value of 0/NULL.

The way the trick works is by using the standard C pre-processor which does text substitution, and expanding the full list of arguments (...) in the macro (as __VA_ARGS__) within a structure initialisation. Unlike function arguments, C has a syntax to initialise a structure which allows you to omit arguments and tag arguments. I am using that syntax in the function, so myfunc("hello",flag2:1) becomes a structure initialiser {"hello",flag2:1} and this structure is passed to the function.

Of course I could just used C++, but that comes with a lot of other baggage, and not something I am that keen on. It has its merits and works well for some applications.

2020-03-08

Big number maths

One of my first C programmes ever, at uni, was a calculator. I had done a lot of code in BASIC, Z80, 6502 and other languages before, but C I learned at uni.

It was a strange exercise for me as I was teamed with someone else, and basically gave up on him. To me a calculator, i.e. something that could parse 1+2*3 and get 7, and understand brackets, was inherently simple. The code was a couple of pages at most and I had added all sorts of extras over and above the basic +, -, *, / binary operators. It handle brackets and operator precedence, and even had a table of operators to allow easy expansion. My "partner" made code that was dozens of pages and every test that involved extra brackets or some unexpected sequence broke it and he had two add extra code for edge cases. Oddly I once encountered a compiler that must have been written in such a way as adding extra brackets, e.g. 1+(2*3), actually broke it and made wrong code. Scary that stuff like exists.

My code has a simple loop to process: prefixes or "("s, operand, postfixes or ")"s, operator, in a loop, ending after last operand. It stacked operands, and operators (after processing the stack for any higher operators before adding), and that was it. It allowed unary pre and post operators, and binary operators. Simple and easy to understand, IMHO.

Of course, it used the normal C code to parse and output numbers, storing internally as floats (I think).

However, having played with mechanical calculators, one of the things I have meant to code one day and had not got around to for over 30 years, is a decimal calculator library... That is until this weekend.

Basically, one tool I have written, and my friends use a lot, includes an "eval" function to evaluate a simple sum for them. It is used in loads of places (as it embeds maths in a back end for a web page). It used integer (long long) types (64 bit) and shifted by number of decimal places it saw, giving it around 18 significant digits. Sadly it broke if you went over that (my bad) and they had some silly case of some numbers plus a fraction which had rather too many decimal places. So they asked me to fix.

A simple fix was limit the size, and then limit size and apply bankers rounding at the limit.

But I decided this may be the time to finally do that decimal maths code. I googled, and there are a few decimal libraries. There are also some arbitrary precision binary libraries. To my surprise the latest GCC has decimal types, where data is stored and processed in decimal. I can only assume we have decimal maths in some processors even, these days. This is good for accounting where the rounding errors you can get are bad (essentially binary does not have a way to represent 0.1 or 0.01 without recurring digits). Sadly GCC may have it, but clib does not yet (scanf and printf), but will some time, and I can see that being useful. Even so, _Decimal128 does have limits on number of digits. There are libraries in other languages, but I wanted something for C.

So I made a C string decimal library, and put it here on GitHub for all to use.

It is a library and command line, and includes functions to add, subtract, multiple, divide, and compare, as well as a simple evaluation function that parses basic sums, just like my uni project calculator.

The key thing is that it works on C text strings for numbers. A simple sum (I learned at school, and now out of date) was 366.2564*86164.091=31558149.7789324. It was numbers from an encyclopaedia in the library. I remember because I had to do the maths manually as no calculator I had would do it to full precision. Even google calculator truncates it a bit. That was my first test, and it worked. Yes, I also learned π to 50 places (and now only remember 25).

So, for add, subtract, and multiple, it is simple to ensure you have as many digits as needed. Sadly division can go on for ever, so I had to include a limit of decimal places, and a rounding rule, and remainder option.

I included rounding on limit on division, and also a separate simple rounding function - obviously actually applied at whatever digit you set. Default is banker's rounding :-

Round towards 0 (i.e. -3.9 rounds to -3)
Round away from 0 (i.e. -3.9 rounds to -4)
Round towards -ve (i.e. -3.9 rounds to -4, 4.2 rounds to 4)
Round towards +ve (i.e -3.9 rounds to -3, 4.2 rounds to 5)
Simple rounding (i.e 2.4 rounds to 2, 2.5 rounds to 3)
Banker's rounding (i.e. 2.4 rounds to 2, 2.5 rounds to 2, 2.6 rounds to 3, 3.5 rounds to 4)

I went further though, and made the calculator work on rational numbers, that means all of the maths is done with numerator and denominator, and only finally at the end does the division get done and rounding applied. This means that 1000/7*7 is 1000, unlike bc which makes it 994. minor optimisations for adding/subtracting with same denominator, and anything with denominator of 1 or a power of 10, making it simpler. For anything without division the maths does not even need a final divide.

I had a few silly bugs, one that fooled me is that zero is a special case, which I store as a magnitude of 0 and no digits, which is fine, but comparing numbers I started by comparing magnitude before comparing digits, so 0 looked like a bigger number than 0.1 as 0 was 0*10^0 and 0.1 was 1^10-1 and magnitude 0 is greater than magnitude -1. I had to add checks for comparison with 0.

However, overall, I am quite chuffed. Heck, I even asked it to work out 1e1000000000+1 and it just worked, that is a billion significant digits!

This is pure coding fun though. But may be useful - feel free to use it.

P.S. Someone asked about Banker's rounding, and I was sure I had blogged once, but cannot find it. Unlike that which I was taught at school, and every Casio (and other) calculator I had, rounding is not as simple as you think. I used to assume that a residual of exactly 0.5, or above, rounded up, and below 0.5 was down. That is what my calculator did. However, this creates a bias. Ideally you want to round an exact 0.5 residual down half the time and up half the time to remove bias. This could be random, but that is bad as you get results which are not reproducible. One simple way, bankers rounding, is to round 0.5 residual to nearest even number. Hence 0.5 rounds to 0, 1.5 rounds to 2, 2.5 rounds to 2, 3.5 rounds to 4, and so on.

2020-02-13

Standard C function to read lines from a file

[update: As I hoped, there is a simple answer, getline(), see comments, thank you Charles Lecklider]

The classic is fgets(), it is simple, and easy to use...

Of course, for some reason, fgets() gives you the line endings, so I usually end up with more like.

The problem, of course, is you have a line length. This is also an advantage in that you constrain the lines and don't have random memory allocation issues, but computers have so much memory and VM these days. How many times have I seen this code, and seen someone have to change 1000 to 10000 one day?

What I would like is a simple function that reads a line and mallocs space as needed. Indeed, it could return the allocated space or NULL for error (EOF, or malloc fail). You'd have to free it, but no big issue. Would also be nice if (a) it stripped the line ending as I literally NEVER want that in the line, and (b) seamlessly handled bloody DOS style carriage returns...

So whilst trying to explain some basic C to my mates, whilst at sea, in the middle of the Atlantic, I tried to explain this whilst making a simple CSV file parsing program for them. We did some googling, and found that I am not alone in trying to find such a function. It seems that fscanf() may be the answer. [update: clearly I did not google well enough!]

To be honest fscanf() is a function I just don't use enough. It is very powerful, but I always find myself parsing things more directly. However, I had not considered it as a means to just get a line.

The magic incantation is something like...

This reads any characters up to a newline, allocates space (that is what m is), and stores in line. Just what we need. A minor variation to handle carriage returns seems to work too...

Bingo, we have our magic line malloc file reader function. Perfect.

And get this, reading the man page it is clear that using the [ function does not consume the leading white space, which is perfect... So all good

Except that is not what happens. We did the CSV stuff, and then went on to TSV (tab separated) and magically leading TABs (i.e. empty first field) were stripped by fscanf()

Why?!?!?!?!?!

Please someone tell me I am being thick and that there is a standard function to do just this. Yes, I could write my own, but this is surely so basic it should be standard C library stuff.

[code mistakes in examples left in for the reader to find]

2018-10-09

SVG, font metrics, am I being thick?

I really hope someone tells me I have missed the bleeding obvious for a change.

I think SVG lacks two key features.

1. The ability to set a maximin size on a line of text, i.e. so that if it works out longer it will be squashed in some way. There is a way to set a calculated size and have either the spacing or the spacing and glyphs adjusted to match that, but I cannot find a way to say "squash, but only if it would be too big".

2. The ability to take a block of text, give it a width (and height, why not) and line spacing and have it fill the box, breaking on spaces between words.

If there is svg magic (and I don't mean javascript to run on a web site that will do the business later), just plain svg that will easily convert to png or pdf, and still work, that is what I want.

The alternative, for which I am still massively failing to find a solution, is a simple Debian installed C code library (not C++ please) that allows me to ask "how wide is this string in this font". I found qfontmetrics which looks spot on, but is C++. Obviously, including emojis!

With this I can make the tools I current have which produce svg output (to later be converted to PDF or PNG, etc) work out when to set a line width, or break blocks of text, and work with normal svg.

All of this is a doddle in postscript, but unless someone has a patch for gs to handle utf-8 strings natively and handle a fuller character set with the likes of emojis, I need to move away from postscript, hence using svg. Just hitting this key stumbling block for now.

Update: Thanks for all the comments. I am looking at freetype at the moment. It looks like it has exactly what I need, but slightly struggling with Emojis and the like at the moment :-)

Update: Yes, FreeType has simple C functions to load a font/face and find the size of glyphs based on unicode characters, and simple to tell it does not have the character and use a noto/emoji font instead, etc. This should be simple to then work out line wrapping and sizes for SVG with tspan for emojis and the like. Now I know it is all possible and how to do it, I have to think about the project for which I was asking this - a new system for designing printed plastic cards...

2018-03-15

memcpy

Having been caught out by this (and yes, I should know better) this is a friendly reminder for those coding in C.

The man page on memcpy is clear.

DESCRIPTION

       The memcpy() function copies n bytes from memory area src to memory area dest.  The memory areas must not overlap.  Use memmove(3) if the memory areas do overlap.

In days gone by the memcpy would be done by a simple loop copying bytes from src to dst until length runs out. e.g. while(len--)*dst++=*src++; or some such, but probably in assembler.

So a classic case of copying a block of data back a few bytes, e.g. memcpy(data,data+1,len) would be fine.

Unfortunately the warning of The memory areas must not overlap. is not to be ignored.

You will get away with ignoring it a lot, and that is the problem! Whether you get away with it depends on a lot of things. Version of C libraries and even version of the compiler, the specific alignment of the points you are moving data to and from, the length you are moving, and probably more factors I cannot think of.

So things may work 100% until next recompiled, or simply until run on a new machine. Worse, they may work most of the time, but not quite all.

The reason is that a memcpy can be carefully optimised. For example, on an ARM you can load a whole load of registers in one go and then store a whole load of registers in one go. It may be more optimal for it to start copying from the end and work backwards, for example. The specification of memcpy not permitting overlapping areas allows for all number of optimisations to be performed in the implementation.

On the other hand memmove has to allow for overlapping areas.

DESCRIPTION

       The  memmove() function copies n bytes from memory area src to memory area dest.  The memory areas may overlap: copying takes place as though the bytes in src are first copied into a temporary array that does not overlap src or dest, and the bytes are then copied from the temporary array to dest.

In practice it does not have to copy to somewhere temporarily, just make sure it moves data in the right order if there is an overlap. This means more checks and code that may not have quite the same optimisations available.

So, always be careful to use memmove if you cannot be sure the memory areas do not overlap.

P.S. Someone pointed out I am getting forgetful. See http://www.revk.uk/2011/02/memcpy-minor-duh-moment-on-my-part-and.html

2017-07-21

warning: comparison between signed and unsigned integer expressions

This is one of the stupidities in the C language and it bugs me because it would be so simple for C to just code it correctly. I'd really like a gcc option to do this!

When you store whole numbers in binary you usually have a choice of signed or unsigned. The signed version allows negative numbers but at the cost of the range of positive values possible.

For example a signed char allows values -128 to +127, but an unsigned char allows values 0 to 255.

If you compare them, using ==, !=, >, or < for example, the operation converts the signed value to an unsigned value and then compares.

Example

signed int a = -1;

unsigned int b = 1;  

if (a > b)

   printf ("a>b\n");

if (b > a)

   printf ("b>a\n");

This print a>b even though a is -1 and b is 1!

This is because -1, converted to an unsigned value, is a big number, in fact the biggest an unsigned int can be.

What pisses me off is that, even when C was invented, the code to make the comparison work would have been one check of one bit extra. Basically, whatever the comparison, you just have to check the signed value is negative or not before making the comparison. If it is negative that means it is not equal to the unsigned value, and is smaller than the unsigned value, so whatever comparison you were doing is decided by the signed value being negative before going on to do the comparison as normal.

To me this would have been a far more logical behaviour than changing the value of the signed variable by making it unsigned.

RevK^®'s ramblings