3

I'm working on a Cortex M0 cpu, which doesn't have hardware division, so every time I divide something, the GCC libary function is used. Now one of the division I do the most is dividing by 256, to convert shorts into bytes. Is there some way I can do this more efficiently (for example by bit-shifting) than the default GCC library will do it?

Maestro
  • 9,046
  • 15
  • 83
  • 116
  • 3
    Will the operator '>>' do the trick for you? You know '>> 8'? – Refugnic Eternium Feb 03 '13 at 16:48
  • 6
    Don't you mean dividing by 256? Not 255? – Bill Lynch Feb 03 '13 at 16:48
  • 2
    You would be interested in chapter 10, Integer division by constants, of the book Hacker's delight. But the GCC implementers have probably read it. Did you look at the assembly before assuming there was a better way? (note: this comment assumes that you **do** mean 255) – Pascal Cuoq Feb 03 '13 at 16:48
  • casting a pointer should work as well. – Andreas Grapentin Feb 03 '13 at 16:49
  • 3
    If you divide by 255 to get the high-order 8 bits, I've got a bad news for you. – John Dvorak Feb 03 '13 at 16:49
  • @PascalCuoq I havent looked at the assembly, so I dont know how smart the software division code is, but since this is a special case (normally you would have more complicated divisions than just 256), I can imagine that a formula that is tailored for this situation would be faster than the generic approach in the GCC library. – Maestro Feb 03 '13 at 16:54
  • Do you want high-order bytes or low-order bytes of your short ? Or both in two distinct bytes ? – Julien Palard Feb 03 '13 at 16:55
  • @Joshua once again - do you _really_ want to divide by 255, not 256? Also, do you think you can outsmart _the compiler_? – John Dvorak Feb 03 '13 at 16:56
  • @JulienPalard I need -32768 mapped to 0, and +32767 mapped to 255. If you have a formula for that, I will accept it as an answer. – Maestro Feb 03 '13 at 16:57
  • @Joshua what about `x >> 8` or `x >>> 8`? – John Dvorak Feb 03 '13 at 16:57
  • 1
    @JanDvorak No it has to be 256, you're right. – Maestro Feb 03 '13 at 16:57
  • 1
    -32768 / 256 = -128, which is not 0. +32767 / 256 = +127, which is not 255. Take more time to specify your problem accurately. – Jonathan Leffler Feb 03 '13 at 17:06
  • 1
    Did you look at the resulting assembly? It is pretty likely that gcc will optimize a division by 256 into a bit shift anyway. Compilers are smart nowadays. – fuz Feb 03 '13 at 17:15

3 Answers3

7

As per your comments, you want -32768 mapped to 0, and 32767 mapped to 255. Thus, the formula you want is:

short s = /* input value */;
unsigned char c = (unsigned char) ((s + 32768) / 256);

The other commenters have noted that you can do that divide-by-256 with a right-shift or various other tactics, which is true -- a reasonable version of this would be:

unsigned char c = (unsigned char) ((s + 32768) >> 8);

However, there is no need for such optimizations. GCC is very smart about converting divide-by-constant operations into special-case implementations, and in this case it compiles both of these into exactly the same code (tested with -O2 -mcpu=cortex-m0 -mthumb and GCC 4.7.2):

    mov     r3, #128
    lsl     r3, r3, #8
    add     r0, r0, r3
    lsr     r0, r0, #8
    uxtb    r0, r0

If you try to be too clever (as with the union or pointer-cast examples in other answers), you are likely to just confuse it and get something worse -- especially since those work by memory loads, and adding 32768 means you already have the value in a register.

Brooks Moses
  • 9,267
  • 2
  • 33
  • 57
  • 1
    You'd be surprised... I once saw a GCC port which dealt with explicit shift operations in the C code by emitting multiplication opcodes - someone probably assumed a hardware multiplier and hence equivalent cost, though the actual experimental hardware it was trying to run on did not have the multiply instruction implemented. Fortunately, being an FPGA, it was easy to add. Hopefully the case with division is better handled. – Chris Stratton Feb 04 '13 at 19:59
  • @ChrisStratton: Yup -- which is why it's useful to read the generated assembly every so often, just to make sure what you think is happening is actually happening! Although the case you describe is pretty clearly a bug in GCC. – Brooks Moses Feb 04 '13 at 23:20
2

just cast a pointer.

unsigned char *bytes = (unsigned char*)&yourvalue;

now, bytes[0] will hold hold one byte of your value, and bytes[1] will hold th other. the order depends on the endianness of your system

Andreas Grapentin
  • 5,499
  • 4
  • 39
  • 57
  • 1
    @JanDvorak yes, but the answer **states** that the result depends on endianness. so it is in no way incorrect. Also, the OP knows exactly which architecture he is developing for, so he knows the endianness beforehand. – Andreas Grapentin Feb 03 '13 at 17:01
  • 1
    There's little justification for writing platform dependent code when the platform independent code is cleaner, and likely to be faster - programs don't necessarily spend their entire lifetime on the target for which they were originally written. There's a fairly high chance that in the shift case the variable could be held in a register (rather than memory) by an optimizing compiler; in the pointer case there might be some optimizing compiler smart enough to figure out that this is the only reason you are using a pointer and implement with registers, but it seems substantially less likely. – Chris Stratton Feb 04 '13 at 20:03
1

You could use union this way:

#include <stdio.h>

union Word {
    struct {
        unsigned char high;
        unsigned char low;
    } byte;
    unsigned short word;
};

int main(int argc, char **argv) {

    union Word word;
    word.word = 0x1122;

    printf("L = 0x%x, H = 0x%x", word.byte.low, word.byte.high);

    return 0;
}
  • 1
    Would there be a performance gain from using the union vs bit-shifting? It seems the union would be faster, since there are no extra operations necessary, but maybe it has other overhead? – Maestro Feb 03 '13 at 17:11
  • @Joshua Using `union` there is nothing more than `mov` instructions, on assembly level. –  Feb 03 '13 at 17:20
  • 2
    **This will give the wrong answer**, as it assume a Big-Endian processor but the target in question is actually Little-Endian. You could fix it, but the risk of a mistake remains, especially if the code is ever ported to something else. – Chris Stratton Feb 04 '13 at 20:18