2018-03-15

memcpy

Having been caught out by this (and yes, I should know better) this is a friendly reminder for those coding in C.

The man page on memcpy is clear.

DESCRIPTION
       The memcpy() function copies n bytes from memory area src to memory area dest.  The memory areas must not overlap.  Use memmove(3) if the memory areas do overlap.

In days gone by the memcpy would be done by a simple loop copying bytes from src to dst until length runs out. e.g. while(len--)*dst++=*src++; or some such, but probably in assembler.

So a classic case of copying a block of data back a few bytes, e.g. memcpy(data,data+1,len) would be fine.

Unfortunately the warning of The memory areas must not overlap. is not to be ignored.

You will get away with ignoring it a lot, and that is the problem! Whether you get away with it depends on a lot of things. Version of C libraries and even version of the compiler, the specific alignment of the points you are moving data to and from, the length you are moving, and probably more factors I cannot think of.

So things may work 100% until next recompiled, or simply until run on a new machine. Worse, they may work most of the time, but not quite all.

The reason is that a memcpy can be carefully optimised. For example, on an ARM you can load a whole load of registers in one go and then store a whole load of registers in one go. It may be more optimal for it to start copying from the end and work backwards, for example. The specification of memcpy not permitting overlapping areas allows for all number of optimisations to be performed in the implementation.

On the other hand memmove has to allow for overlapping areas.

DESCRIPTION
       The  memmove() function copies n bytes from memory area src to memory area dest.  The memory areas may overlap: copying takes place as though the bytes in src are first copied into a temporary array that does not overlap src or dest, and the bytes are then copied from the temporary array to dest.

In practice it does not have to copy to somewhere temporarily, just make sure it moves data in the right order if there is an overlap. This means more checks and code that may not have quite the same optimisations available.

So, always be careful to use memmove if you cannot be sure the memory areas do not overlap.

P.S. Someone pointed out I am getting forgetful. See http://www.revk.uk/2011/02/memcpy-minor-duh-moment-on-my-part-and.html

11 comments:

  1. Some cpus have a hardware block copy instruction (eg. XAP2) so the compiler compiles memcpy() inline to that instruction. memmove() on the other hand checks whether the block copy instruction would violate the overlap and if it doesn't uses it but otherwise does the copy carefully but more slowly in assembler.

    Some platforms have DMA hardware that can do memory to memory moves. Runtime libraries on those platforms can replace memcpy() with a version that for big copies uses DMA but for smaller copies where the DMA setup overhead would dominate calls the original memcpy().

    ReplyDelete
  2. Reminds me of this https://bugzilla.redhat.com/show_bug.cgi?id=638477

    (glibc optimised memcpy() and broke Adobe's flash-plugin)

    ReplyDelete
    Replies
    1. Spot on, and that seems to have sparked some lively debate!!

      Delete
    2. > glibc optimised memcpy() and broke Adobe's flash-plugin

      And this was seen as a bug?!

      Delete
    3. There was (and may still be -- I should check, and fix it if it's still outstanding) a GCC bug whereby structure assignment for large structures could be offloaded to memcpy... even when the assignment was e.g. 'a = a' or the more likely case of '*a = *b' where a and b were pointers that may alias the same structure.

      Result: a *compiler-generated* overlapping memcpy(). Whoops.

      Delete
  3. Long gone are the days when you could use overlapping areas to flood fill a string with a pattern. Now compilers and processors think they know better than you. E.g. With a 20 character string you could move "1234" to 1-4 then move 1-16 to 5-20 and you would get "12341234123412341234".

    ReplyDelete
  4. On the original C compiler for the ARM (Norcroft C for the Acorn Archimedes) memcpy and memmove were actually the same function. memmove guarantees not to break the overlap case, but memcpy doesn't guarantee that it will.

    The implementation did indeed use lots of registers, and was (by the standards of the day) blindingly fast. It was part of what let the Archimedes desktop have solid window drags when practically everything else still only let you drag an outline of the window.

    ReplyDelete
    Replies
    1. I have an Archi in the loft and must wake it up...

      Delete
    2. I have a couple, plus an R140 which was my first introduction to UNIX at home.

      I remember seeing some weirdly clueless reviews of Norcroft C for the Archimedes. Two sample complaints about it:

      * It didn't support small/medium/compact/large models. Well duh! It's a 32 bit flat address space.

      * It produced lots of diagnostics when fed invalid C source which the contemporary Microsoft C compiler would process without comment. Somehow accurate diagnostics were seen by some reviewers as a bad thing.

      Delete
    3. Norcroft C for the ARM is still by far the best C compiler I have ever used, both in how good the code it generated was and the superb warnings and errors from the compiler about the source code. gcc is rubbish in comparison.

      Delete