C: Memcpy vs Shifting: Whats more efficient?

Question

I have a byte array containing 16 & 32bit data samples, and to cast them to Int16 and Int32 I currently just do a memcpy with 2 (or 4) bytes.

Because memcpy is probably isn't optimized for lenghts of just two bytes, I was wondering if it would be more efficient to convert the bytes using integer arithmetic (or an union) to an Int32.

I would like to know what the effiency of calling memcpy vs bit shifting is, because the code runs on an embedded platform.

Compare the assembly your compiler generates. Some are pretty smart about optimizing `memcpy`, especially when the copied size is known at compile time. — Mat, Feb 06 '12 at 13:35
@Mat The compiler is GCC and the CPU is a Cortex M3, but I doubt I will be able to understand the assembly code it generates. I just want to know the performance difference in general case, but if the difference is so small it depends on cpu/compiler, I guess it can be neglected? — Maestro, Feb 06 '12 at 13:56
The assembly of Cortex M3 is not as bad as, say, x86. Look for expensive instructions like LOAD ans STORE. — Lindydancer, Feb 08 '12 at 20:08

score 4 · Accepted Answer · answered Feb 06 '12 at 15:04

I would say that memcpy is not the way to do this. However, finding the best way depends heavily on how your data is stored in memory.

To start with, you don't want to take the address of your destination variable. If it is a local variable, you will force it to the stack rather than giving the compiler the option to place it in a processor register. This alone could be very expensive.

The most general solution is to read the data byte by byte and arithmetically combine the result. For example:

uint16_t res = (  (((uint16_t)char_array[high]) << 8)
                | char_array[low]);

The expression in the 32 bit case is a bit more complex, as you have more alternatives. You might want to check the assembler output which is best.

Alt 1: Build paris, and combine them:

uint16_t low16 = ... as example above ...;
uint16_t high16 = ... as example above ...;
uint32_t res = (  (((uint32_t)high16) << 16)
                | low16);

Alt 2: Shift in 8 bits at a time:

uint32_t res = char_array[i0];
res = (res << 8) | char_array[i1];
res = (res << 8) | char_array[i2];
res = (res << 8) | char_array[i3];

All examples above are neutral to the endianess of the processor used, as the index values decide which part to read.

Next kind of solutions is possible if 1) the endianess (byte order) of the device match the order in which the bytes are stored in the array, and 2) the array is known to be placed on an aligned memory address. The latter case depends on the machine, but you are safe if the char array representing a 16 bit array starts on an even address and in the 32 bit case it should start on an address dividable by four. In this case you could simply read the address, after some pointer tricks:

uint16_t res = *(uint16_t *)&char_array[xxx];

Where xxx is the array index corresponding to the first byte in memory. Note that this might not be the same as the index to he lowest value.

I would strongly suggest the first class of solutions, as it is endianess-neutral.

Anyway, both of them are way faster than your memcpy solution.

Do you know if using an Union for conversion is faster than the shifting methods above? — Maestro, Feb 08 '12 at 19:28
It depends on the architecture, and if you need to copy the data to a union or if you could treat the char array as a union directly. In the ideal case, if your bytes are in the right order and aligned, only one machine instruction is needed. If that is not the case, you could have to read the two (or four) characters from the array, save them to the union (typically stored in memory) and then read the full object. Typically this is much more expensive than two (or four) reads followed by shifting that could be performed in processor registers. Check the assembler output or run and mesure it. — Lindydancer, Feb 08 '12 at 20:02

score 2 · Answer 2 · answered Feb 06 '12 at 13:54

2

memcpy is not valid for "shifting" (moving data by an offset shorter than its length within the same array); attempting to use it for such invokes very dangerous undefined behavior. See http://lwn.net/Articles/414467/

You must either use memmove or your own shifting loop. For sizes above about 64 bytes, I would expect memmove to be a lot faster. For extremely short shifts, your own loop may win. Note that memmove has more overhead than memcpy because it has to determine which direction of copying is safe. Your own loop already knows (presumably) which direction is safe, so it can avoid an extra runtime check.

answered Feb 06 '12 at 13:54

R.. GitHub STOP HELPING ICE

208,859
35
376
711

I just copy 4 bytes from the array to a single variabele, so im not moving data. All it does is: memcpy(long,char_array[offset],4) – Maestro Feb 06 '12 at 14:11
1

OK I misunderstood. Then unless you can be 100% sure your samples are aligned and you're not violating the aliasing rules, you must use `memcpy`. The compiler will compile it to a single load/store (no function call) if possible. – R.. GitHub STOP HELPING ICE Feb 06 '12 at 14:12
The OP meant "bit shifting", not moving memory. :) – Graham Borland Feb 06 '12 at 14:35

C: Memcpy vs Shifting: Whats more efficient?

2 Answers2