In embedded MCU application is it better to use uint_fast16_t or size_t in for loops?

Question

I would like to write portable code for applications that will run on different MCUs (16-bits, 32-bits or 64-bits base).

MSP-430
nRF52 (32-bits)
PIC (16-bits)
C51 (8-bits)

Let's consider this snippet:

events = 0;
for (size_t i = 0; i < sizeof(array) / sizeof(array[0]); i++) {
    if (array[i] > threshold) 
        events++;
}

My question concerns the type of the loop counter variable, here is size_t.

Usually size_t should be large enough to address all the memory of my system. So using size_t might impact the performance of my code on some architecture because the width of this variable is too large for the length of the array I have.

With this assumption I should better use uint_fast16_t because I know that my array is less than 65k elements.

Does it make sense to care about this writing or is my compiler smart enough to optimize it?

I think uint_fast16_t is rarely used and pretty much boilerplate in comparison with size_t.

To be more specific about my question:

Do I improve the portability of my code by systematically use the proper type for my loop counter (uint_fast8_t, uint_fast16_t, ...) or should I prefer using size_t because in most of the cases it will make no differences in terms of performance?

EDIT

Following your comments and remark it is clear that most of the time, the compiler will register the loop counter so choosing between size_t or uint_fast8_t does not matter much.

https://godbolt.org/g/pbPCrf

main: # @main
  mov rax, -80
  mov ecx, dword ptr [rip + threshold]
.LBB0_1: # =>This Inner Loop Header: Depth=1
  [....]
.LBB0_5: # in Loop: Header=BB0_1 Depth=1
  add rax, 8     # <----------- Kept in a register
  jne .LBB0_1
  jmp .LBB0_6
.LBB0_2: # in Loop: Header=BB0_1 Depth=1
  [....]
.LBB0_6:
  xor eax, eax
  ret

This question could become a real issue if the loop length become bigger than the internal CPU register e.g. doing a 512 loops on a 8-bit micro-controller.

For this sort of optimization (and question), the only answer is to look at the generated code in both ways. (Even if some other compiler does the expected/desired optimization, there's no guarantee that *your* compiler would do the same). — P.P, Feb 22 '18 at 09:32
Do you know any compiler that does this kind of optimization? — nowox, Feb 22 '18 at 09:37
There is this language called C and a data type called int that solves this problem. — old_timer, Feb 22 '18 at 18:09
@old_timer you don’t know anything about C, especially the standards ISO C99 which recommend to never use int — nowox, Feb 22 '18 at 19:10
Note this is not a stackoverflow question, as it is both too broad and primarily opinion based there is no factual answer that applies to the broad nature of this question. — old_timer, Feb 22 '18 at 23:27
I learned C from the first edition K&R books, I think I know C. You have no clue what I know or my experience, you should be careful making such statements. int is very much a part of the standards from that point to when the standards started coming out to the present. I think you need to understand how compilers are built, targetted, and where these additional libraries and headers come from and how they are applied to the target (and how often mis-applied, and how few know how to tell the difference). — old_timer, Feb 22 '18 at 23:36
@old_timer It is a _fact_ that the vast majority of professionally written embedded systems use `stdint.h`. Anyone who has the slightest experience from the industry can tell you as much. This is because _portability_ and _deterministic behavior_ are desired. `int` provides neither. To argue against that and propagate for `int` is just silly, `stdint.h` exists for a reason. There's nothing wrong with the question, as it is also evident to anyone with the slightest experience from small microcontroller systems that `size_t` is problematic, although not nearly as bad practice as using `int`. — Lundin, Feb 23 '18 at 07:32
@Lundin, you also need to be careful about making such statements. You to are missing the point, there is a reason stdint.h is a header file and not part of the core language itself. I at least know from your answers you have a similar quantity of experience as I do. Using the library side of the language in baremetal systems is problematic in general, and few have the experience to know (from the sheer volume of examples on the net and questions on a site like this) if they are using a proper stdint for their target or the hosts. or by the fact that they take a canned library package. — old_timer, Feb 23 '18 at 12:34
@Lundin you also have enough experience at this site to know not to answer a question with so many obvious close options. For pure portability anything larger than 8 bits will do here, for performance anything with a size in bits is bad here. It will perform badly on as many systems as it performs well on, as you should well know and with your experience should be pointing out rather than leading down a path of short term gain long term loss. P.P. has the only real answer here, not you not me. — old_timer, Feb 23 '18 at 12:39
@old_timer `stdint.h` was added to fix obvious, well-known problems in the language. The only reason why it wasn't added as keywords is because the committee is deadly afraid to fix any of the countless severe problems in the C language. There is absolutely no problem using this library in a bare metal application, as it is required to be supported by any conforming implementation, C11 4 §6. Chapters 4 and 5 being the very core of the C language. C99 was poorly supported some 15 years ago, but not today. — Lundin, Feb 23 '18 at 12:45
The edit has further lead down the wrong path. The question states MCUs but then uses 64 x86 as a test case, a bad choice for more than one reason. The negative effects are not shown on x86 instructions directly, the baggage is carried elsewhere, giving the illusion of efficiency. This leads to incorrect assumptions rather than continuing to examine the results for the actual targets. — old_timer, Feb 23 '18 at 12:45
@Lundin not talking about a spec issue talking about an implementation issue. Specs are great but how they get implemented and then used are as important as the spec, not understanding that, not experiencing that and using that experience leads to bad assumptions and bad usage with a false notion of the spec as your defense. The OP has clearly done this exact thing in the edit, completely defending my argument/education here. — old_timer, Feb 23 '18 at 12:47

score 1 · Answer 1 · answered Feb 22 '18 at 09:36

1

For portable code, use size_t.

For fast code... well, it depends on your compiler and processor. If you use a 16 bit type, it might run fastest on your 16 bit processor but actually be slower than size_t on a 64 bit processor. You shouldn't assume anything until you measure the performance.

I'd use size_t and only consider optmisation further if there was a demosntrable performance issue.

answered Feb 22 '18 at 09:36

JeremyP

84,577
15
123
161

That's the reason why I haven't used `uint16_t`, but `uint_fast16_t` to guarantee that my implementation will be more or equally efficient to the solution that uses `size_t` – nowox Feb 22 '18 at 09:39
1

*"If you use a 16 bit type, it might run fastest on your 16 bit processor but actually be slower than size_t on a 64 bit processor."* Unless compiler is crap (which is of course realistic possibility), I would expect `uint_fast16_t` to be defined as some larger/faster type, than 16-bit. – user694733 Feb 22 '18 at 09:40
@user694733 And how do you know the compiler isn't crap unless you test it? Anyway, we've probably already spent longer discussing the issue than the OP will save by the optimisation over the whole life of the program. – JeremyP Feb 22 '18 at 09:47
`uint_fast16_t` will never be slower than `size_t`on a 64 bit processor, or the compiler is not conforming. – Lundin Feb 22 '18 at 10:06
1

@Lundin Note 262 on 7.20.1.3 says "the designated type is not guaranteed to be fastest for all purposes". So, it can be slower and still conforming. – JeremyP Feb 22 '18 at 10:13
@Lundin also, for the given fragment of code, a 16 bit type potentially overflows for large element sizes unless the array is guaranteed to be small. If the array is guaranteed to be small, the performance savings will be marginal. – JeremyP Feb 22 '18 at 10:16
@JeremyP That note simply states that if there are several criteria for "fast", then the compiler may pick either of them. I'm not really certain what this would be in practice, speed has to do with how small registers the CPU can handle as well as with how it handles alignment. But I don't see how these parameters would conflict. And foot notes are not normative anyhow. – Lundin Feb 22 '18 at 10:19
@JeremyP Doing 65535 comparisons against a 16 bit versus 32 bit, on a 16 bit MCU, is a huge performance gain. It is _not_ marginal. – Lundin Feb 22 '18 at 10:20
@Lundin It says the type is not guaranteed to be the fastest for all purposes. Iterating an array is one purpose, so if it happens that using it makes the code slower than not, the compiler is still conforming contrary to your comment. – JeremyP Feb 22 '18 at 10:22
@Lundin You can only do 65535 comparisons with a 16 bit index if the size of the array element is 1. – JeremyP Feb 22 '18 at 10:23
@JeremyP In the normative text it says "...an integer type that is usually fastest operate with among all integer types that have at least the specified width." So if iterating through arrays is "unusual" then you are correct :) – Lundin Feb 22 '18 at 10:23
@Lundin so you withdraw your comment about such a compiler not being conforming then. It might be a crap compiler, but it is still conforming. – JeremyP Feb 22 '18 at 10:25
Not really, it would be hard for a compiler vendor to argue and claim that array iteration is something "unusual". – Lundin Feb 22 '18 at 10:29

score 0 · Accepted Answer · answered Feb 22 '18 at 10:05

0

For MCUs, use the smallest type that you know will fit the array size. If you know that the array is possibly larger than 256 bytes, but never larger than 65536, then uint_fast16_t is the most appropriate type indeed.

Your main issue here will be 16 bit MCUs with extended flash (>64kb), giving 24 or 32 bit address width. This is the norm for many 16 bitters. On such systems, size_t will be 32 bits, and will therefore be a slow type to use.

If portability to 8 or 16 bitters is no concern, then I would have used size_t.

In my experience, many embedded compilers are not smart enough to optimize code to use a smaller type than the stated one, even though they can deduct the array size at compile-time.

answered Feb 22 '18 at 10:05

Lundin

195,001
40
254
396

I have edited my question because I realized that most of the time, the compiler will *register* the loop counter and the type chosen has no real importance except on special cases. – nowox Feb 23 '18 at 09:21
@nowox Of course it will be a register. So if you have told the compiler to use a larger type than the register size, you'll get a very slow loop. – Lundin Feb 23 '18 at 09:36
And this is precisely the idea of my question. Does it make sense to care about the size of `size_t` if in 99% of the time `sizeof(size_t) <= sizeof(register int)` :) – nowox Feb 23 '18 at 09:40
@nowox As I already wrote in my answer, most 16 bitters on the market use >64kb flash meaning they have extended address buses. `size_t` is likely to follow. So you have a perfectly realistic scenario where 16 bits is the fastest type but `size_t` is 32 bits. I think this can happen on TI, NXP, Renesas etc etc. – Lundin Feb 23 '18 at 09:51

score 0 · Answer 3 · answered Feb 23 '18 at 01:02

0

As with any optimization, write simple code for the portable common case first (using size_t). Then take a look at the assembly on your platform with other types. If one of those types works faster or generates significantly smaller code you can typedef a special index type for those types of access. For example you could use the concept of near, far and huge pointers (and corresponding indices) except use fixed width types for clarity.

/* The compiler for my16bitmcu, cannot detect ranges to use appropriate types */
#if defined __MY16BITMCU__ /* replace with architecture's predefined macro */
  typedef uint16_t size8t, size16t; /* use uint8_t for size8t on 8bit */
  typedef uint32_t size32t;
  typedef uint64_t size64t;
  typedef int16_t ptrdiff8t, ptrdiff16t; /* use int8_t for ptrdiff8t on 8bit */
  typedef int32_t ptrdiff32t;
  typedef int64_t ptrdiff64t;
#else
  typedef size_t size8t, size16t, size32t, size64t;
  typedef ptrdiff_t ptrdiff8t, ptrdiff16t, ptrdiff32t, ptrdiff64t;
#endif

/** example usage: sum the total of an array
 ** using size8t for count reduces complexity on some 8/16 bit systems
 ** on other systems, size8t is the same as size_t
 **/
int sum_numbers(int *numbers, size8t count){
  int total = 0;
  while(count--) total += numbers[count];
  return total;
}

answered Feb 23 '18 at 01:02

technosaurus

7,676
1
30
52

Why can't you use `uint_fastn_t` or `uint_leastn_t`? Inventing your own home-brewed types is bad practice ever since C99. – Lundin Feb 23 '18 at 07:25
@Lundin mostly because what is defined as "fast" or "least" may not be optimal for this specific situation. For instance if the platform has an x86-like `LEA` instruction that only uses an `n`-bit register for an offset, but for most other operations the 32 bit registers are "fast"-er. Ideally the compiler would handle this itself, but for new or less active architectures these kinds of workarounds are ... a growing pain. – technosaurus Feb 23 '18 at 08:16
What you describe is a compiler failing to generate optimal code. That's not really the programmer's business, especially not from a portability perspective, which is what this question was about. – Lundin Feb 23 '18 at 09:33
Oh, how I wish it weren't the programmer's business to work around missed compiler optimizations. On 8/16-bit systems where the compiler internally handles larger aggregate types, using the "portable" type can increase the size of the code by orders of magnitude. Sometimes you _have_ to invent types that the standard overlooks - using uint_fast/leastN_t does not convey the purpose of the type and could still be sub-optimal on other systems; whereas size8t conveys the purpose and is if-def'ed to size_t on other systems for portability. – technosaurus Feb 23 '18 at 22:07

In embedded MCU application is it better to use uint_fast16_t or size_t in for loops?

3 Answers3