5

I have a highly portable library (it compiles and works well everywhere, even without a kernel) and I would like that it remains as portable as possible. So far I have avoided 64bit data types, but I might need to use them now – to be precise I would need a 64bit bitmask.

I have never really thought about it and I am not enough an hardware expert (especially concerning embedded systems), but I am wondering now: what are the inconveniences of using uint64_t (or, equivalently, uint_least64_t)? I can think of two approaches to my question:

  1. Actual portability: Are all microcontrollers – including 8bit CPU – able to deal with 64bit integers?
  2. Performance: How slow will a 8bit CPU perform bitwise operations on a 64bit integer compared to a 32bit integer? The function I am designing will have only one 64bit variable, but will perform a lot of bitwise operations on it (i.e. in a loop).
madmurphy
  • 1,451
  • 11
  • 20
  • 2
    An 8-bit CPU has always been able to deal with multi-byte arithmetic and bit manipulation, it's just not very efficient. 64-bit will typically be twice the execution time as 32-bit, but as said below, a single bit in memory can be checked or set/cleared without expense. – Weather Vane Nov 05 '20 at 18:12
  • 2
    Even 8 bit CPUs can handle 64 bit values. Bitwise operations are usually well optimized. On an 8-bit CPU, checking for a single flag will usually be as fast as check for a single flag on 2 32 bits or 4 16 bits values. – Michaël Roy Nov 05 '20 at 18:13
  • Thank you both for the fast answer. My dilemma is the following. Would it be better to split my bitmask into two 32bit integers (using more code), or should I just go elegantly with a single 64bit variable? – madmurphy Nov 05 '20 at 18:16
  • 8
    IMO use a standard 64-bit type and let the compiler figure it out. – Weather Vane Nov 05 '20 at 18:31
  • 2
    It is the same. Just define a set of macros / functions for all the operations you need (setbit, getbit, clearbit, maybe mask) and use these. This will help once you realise that you actually need more than 64 bits, or that your machine does not offer 64 bits. – wildplasser Nov 05 '20 at 18:31
  • 2
    @madmurphy Split the bitmask into 2 if you can form code better than the compiler. Else trust your compiler and code for clarity. Save you valuable time for making efficiencies with higher level code than this small stuff. – chux - Reinstate Monica Nov 05 '20 at 18:31
  • 2
    madmurphy, Consider posting your true working higher level code on Code Review and ask for performance improvement ideas. – chux - Reinstate Monica Nov 05 '20 at 18:38
  • @chux-ReinstateMonica Thank you, that is a very good suggestion. – madmurphy Nov 05 '20 at 18:56
  • @madmurphy No, don't split into two 32-bit integers. Your hand-written split arithmetic will not beat what the compiler can synthesize for 64-bits. See my answer below for actual results using Godbolt. BTW I stumbled upon your question because I had a similar dilemma about splitting a 64-bit bitfield in a protocol that could be used by microcontrollers talking to servers. Naturally, 64-bit is no problem on a server, but I wasn't sure about a microcontroller. – Emile Cormier Mar 21 '23 at 10:13

3 Answers3

2

There are various minimum requirements on a conforming C compiler. The C language allows two forms of compilers: hosted and freestanding. Hosted is meant to run on top of an OS, and freestanding runs without an OS. Most embedded systems compilers are freestanding implementations.

Freestanding compilers have some leeway, they do not need to support all of the standard libraries, but they need to support a minimum subset of them. This includes stdint.h (see C17 4/6). Which in turn requires the compiler to implement the following (C17 7.20.1.2/3):

The following types are required:

int_least8_t int_least16_t int_least32_t int_least64_t
uint_least8_t uint_least16_t uint_least32_t uint_least64_t

So a microcontroller compiler does not need to support uint64_t, but it must (oddly enough) support uint_least64_t. In practice it means that the compiler might as well add uint64_t support too, since it's the same thing in this case.

As for what a 8 bit MCU supports... it supports 8 bit arithmetic through the instruction set, in some special cases also a few 16 bit operations using index registers. But in general, it must rely on software libraries whenever a larger type than 8 bits is used.

So if you attempt 32 bit arithmetic on a 8 bitter, it will inline some compiler software libraries with the code and the result will be hundreds of assembler instructions, making such code very inefficient and memory-consuming. 64 bit will be even worse.

Same thing with floating point numbers on MCUs that lack a FPU, these too will generate horribly inefficient code through software floating point libraries.


To illustrate, take a look at this non-optimized code for some very simple 64 bit addition on an 8-bitter AVR (gcc): https://godbolt.org/z/ezbKjY
It actually supported uint64_t but the compiler spewed out an enormous amount of overhead code, some 100 instructions. And in the middle of it, a call to an internal compiler function call __adddi3 hidden in the executable.

If we enable optimizations, we get

add64:
        push r10
        push r11
        push r12
        push r13
        push r14
        push r15
        push r16
        push r17
        call __adddi3
        pop r17
        pop r16
        pop r15
        pop r14
        pop r13
        pop r12
        pop r11
        pop r10
        ret

We'll have to dig through the library source or single-step the assembly live to see how much code there is inside __adddi3. I would guess it is not a trivial function still.

So as you hopefully can tell, doing 64 bit arithmetic on an 8-bit CPU is a very bad idea.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    Good answer. Note, though, that a type that satisfies the requirements for `uint_least64_t` does not necessarily satisfy the requirements for `uint64_t`. It is allowed for `uint_least64_t` to (i) be longer than 64 bits, and / or (ii) use sign / magnitude or one's complement representation, and if wider than 64 bits it may (iii) have padding bits. `uint64_t` must not have any of those characteristics. – John Bollinger Nov 06 '20 at 13:45
  • @JohnBollinger I don't quite see why a 8-bit compiler vendor would decide to make `uint_least64_t` _larger_ than 64 bits though :) It's a problematic enough type as it is. – Lundin Nov 06 '20 at 14:59
  • Thank you for the very good answer, Lundin. However under this perspective it seems that using two 32bit unsigned integers will not change much, it will still be very inefficient. I would need to use eight 8bit integers, and I think that is out of discussion (it will make my code inefficient everywhere else). I guess I will opt for two 32bit integers as a compromise, also because I might have found a way to use two 32bit integers as efficiently as one 64bit integer. – madmurphy Nov 06 '20 at 15:35
  • P.P.S. I don't know if this will change things radically or only slightly, but the only operators I use are `|`, `&`, `^`, `~` and `!` – so no arithmetic whatsoever. – madmurphy Nov 06 '20 at 15:58
  • Lots of instructions do not make for inefficiency. It is still the same O(). The whole idea of RISC vs CISC instructions sets is trading off speed for complexity. – chux - Reinstate Monica Nov 07 '20 at 21:14
  • @madmurphy The above assembly dump needs to be compared to hand-written code that uses smaller integers, but still 64-bits overall. Also, bitwise operators will not need the carry logic of addition/subtraction. Godbolt has been a godsend to bang up quick tests to estimate the potential gain of micro-optimizations. Be careful not to hard-code the operands! – Emile Cormier Mar 21 '23 at 07:40
  • *So as you hopefully can tell, doing 64 bit arithmetic on an 8-bit CPU is a very bad idea.* This is misleading advice. If the algorithm requires 64 bits of information, and the 8-bit CPU is not overloaded at the rate the 64-bit computation needs to be performed, then it's perfectly fine. If the application needs to perform the 64-bit computation at thousands of times per second, then that's a different matter. The OP did not provide enough information about the rate of computation. – Emile Cormier Mar 21 '23 at 10:00
  • @EmileCormier No it isn't misleading. There are very few, if any, reasons to pick 8 bit MCUs for new designs these days. Yes I have done 64 bit arithmetic on 8 bitters too in the past. It was a stupid idea. I've done DSP-like calculations for RF using 8 bitters in the past. It was a stupid idea. I've done PID regulators and fairly complex control systems using 8 bitters in the past. It was a stupid idea. There is a pattern here: picking an 8 bitter for complex, calculation-intense projects turns out to be a stupid idea. – Lundin Mar 21 '23 at 10:13
  • @Lundin If you had written "doing **intensive** 64 bit arithmetic" then I would agree with you. The way that sentence is currently worded can lead one to believe that if you need to perform the occasional 64-bit arithmetic, then an 8-bit MCU is inadequate. I admit I'm out of touch with the cost of today's 8-bit MCUs compared to 16 or 32-bit, so perhaps there's not even an economical reason to use 8-bit MCUs anymore. – Emile Cormier Mar 21 '23 at 10:28
  • @EmileCormier 64 bit access is so ridiculously expensive, so it is similar to the analogy "Either someone can fill a cargo ship with bananas and ship it from Brazil to Europe in one go, then I can buy the banana at the supermarket. Or I could buy flight tickets for my kid, have him fly from Europe to Brazil, head out in the forest somewhere to pick bananas, then head back. They'll need to stay at a hotel for some nights no doubt. Then smuggle the bananas through the customs... And finally we will have bananas that cost us 2000€ instead of 1€. It is perfectly possible!" – Lundin Mar 21 '23 at 11:03
  • Replace € with execution time and mA consumption and there you go. As for the price of 8 bitters, we pretty much ran out of the final arguments for using them somewhere around year 2010-2012. – Lundin Mar 21 '23 at 11:04
2

I've tested four variants of 64-bit bitwise AND using the Arduino Mega compiler on Godbolt.

struct pair
{
    uint32_t hi;
    uint32_t lo;
};

struct quad
{
    uint16_t w;
    uint16_t x;
    uint16_t y;
    uint16_t z;
};

struct octuplet
{
    uint8_t n1;
    uint8_t n2;
    uint8_t n3;
    uint8_t n4;
    uint8_t n5;
    uint8_t n6;
    uint8_t n7;
    uint8_t n8;
};

uint64_t bitwiseAnd64(uint64_t bits, uint64_t mask)
{
    return bits & mask;
}

pair bitwiseAndPairs(const pair& bits, const pair& mask)
{
    return pair{bits.hi & mask.hi, bits.lo & mask.lo};
}

quad bitwiseAndQuads(const quad& bits, const quad& mask)
{
    return quad{bits.w & mask.w, bits.x & mask.x,
                bits.y & mask.y, bits.z & mask.z};
}

octuplet bitwiseAndOctuplets(const octuplet& bits, const octuplet& mask)
{
    return octuplet{bits.n1 & mask.n1, bits.n2 & mask.n2,
                    bits.n3 & mask.n3, bits.n4 & mask.n4,
                    bits.n5 & mask.n5, bits.n6 & mask.n6,
                    bits.n7 & mask.n7, bits.n8 & mask.n8};
}

The results are:

  1. Bitwise AND on uint64_t operands:
    • 25 assembly instructions
  2. Piecewise bitwise AND on pairs of uint32_t operands
    • 69 assembly instructions
  3. Piecewise bitwise AND on quads of uint16_t operands.
    • 71 assembly instructions
  4. Piecewise bitwise AND on octuplets of uint8_t operands.
    • 60 assembly instructions

So I was not able to beat the compiler's synthesized 64-bit bitwise AND. Note that passing the structs by value adds significantly more instructions.

If what you mostly need to do is check if single bit a set or reset, then the above tests won't model your use case very well. Checking if a single bit is set requires a lot less work than computing the entire bitwise AND result!

So I tried 5 ways of checking if one bit in a set 64-bits is set on Godbolt.

struct pair
{
    uint32_t hi;
    uint32_t lo;
};

struct quad
{
    uint16_t w;
    uint16_t x;
    uint16_t y;
    uint16_t z;
};

struct octuplet
{
    uint8_t n1;
    uint8_t n2;
    uint8_t n3;
    uint8_t n4;
    uint8_t n5;
    uint8_t n6;
    uint8_t n7;
    uint8_t n8;
};

bool test64(uint64_t bits)
{
    return (bits & 0x0000000000008000) != 0;
}

bool testPair(const pair& bits)
{
    return (bits.lo & 0x00008000) != 0;
}

bool testQuad(const quad& bits)
{
    return (bits.z & 0x8000) != 0;
}

bool testOctuplet(const octuplet& bits)
{
    return (bits.n7 & 0x80) != 0;
}

typedef uint8_t Bytes[64];

bool testArray(const Bytes& bytes)
{
    return bytes[15] != 0;
}

Results:

  1. Testing if a bit is set in a uint64_t integer
    • 7 assembly instructions
  2. Testing if a bit is set in a pair uint32_t operands
    • 15 assembly instructions
  3. Testing if a bit if set in a quad of uint16_t operands.
    • 6 assembly instructions
  4. Testing if a bit is set in an octuplet of uint8_t operands.
    • 6 assembly instructions
  5. Testing if an array byte at a given position is one:
    • 8 assembly instructions

So the moral of the story is: let the compiler worry about bitwise arithmetic for any word length supported by the compiler!

Emile Cormier
  • 28,391
  • 15
  • 94
  • 122
  • Thank you for the insights, Emile. In the end I came to the conclusion that at least in my case (i.e. just bitwise operations but *lots of them*, in a loop) I better trust the compiler. *But that does not mean that I am actually happy* to use 64bit numbers for something that should be as (efficiently) portable as possible; it is just the least worst solution. – madmurphy Apr 16 '23 at 15:03
  • @madmurphy Have you considered a [struct with bitfields](https://en.cppreference.com/w/cpp/language/bit_field)? They are probably no more efficient than manual bitwise arithmetic, but they may be cleaner in code. Please be aware that they are not portable in the sense that different compilers/platforms may pad the bits differently. This may be an issues if the bytes are sent over the wire or stored to hardware registers. Sorry if you already know this, but I'm writing it also for beginners who stumble upon this question. – Emile Cormier Apr 16 '23 at 16:21
0

Well, if your primary concern is to maintain a fair level of compatibility, and that's the reason to avoid using 64bit number, why don't you use an array of int integers, and consider using one full integer to store, let's say, 30 bits.

I recommend you to have a look to standard library sources concerning the use of bit masks (larger than 32 bits) for representing e.g. the files touched by the select(2) system call, and how to use the FDSET macros.

What is true is that you are probably having the problem of deciding if crossing the limit of 32 bits in a data type used to represent bitmaps, or solving the problem (temporarily) by using the still available 64bit types. This will be a next scale problem when you get around 64bit bitmasks and the you'll finally have to cross the line.

You can do it now, as an exercise, and you'll learn that a data type on the end is a more or less large set of bits and you can give them any use you want. Do you plan to use 80bit long double values to store larger than 64bit bitmasks? I think you won't, so think on the array solution, that probably will solve your problem once and forever.

Should your problem be my case, I'd write an array of 32bit unsigned numbers, so all bits are equally behaved at shifts, bit operations and the like.


#define FDSET_TYPE(name, N)  unsigned int name[((N) + 31U) >> 5]
#define FDSET_ISSET(name, N) ((name[(N) >> 5] & 1 << (N & 0x1f)) != 0)

...

    FDSET_TYPE(name, 126);

...

    if (FDSET_ISSET(name, 35)) { ...

in the above example above, the FDSET_TYPE macro allows you to declare a variable of the number of bits you pass as second parameter, and implements it using an array of unsigned 32bit integers, rounded up to the next value to allow all bits to be included. The FDSET_ISSET(name, 35) calculates the cell and the offset where the requested bit resides and masks it with the remainder of dividing the number you pass by 32 --- but as we selected a power of two, y use a mask of 0x1f to mask the last 5 bits of the number to get the remainder mod 32).

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31