Does using global variables increase or decrease performance, in C code compiled for ARM7?

Question

Does using lots of global variables in C code decrease or increase performance, when compiled for an ARM7 embedded platform?

The code base consists of multiple C source code files which refer each other's global variables using the extern keyword. Different functions from different source code files refer to different global variables. Some of the variables are arrays.

The compiler I'm using is IAR's EW ARM kickstart edition (32kb).

The general tradeoff is usually this: Downside - increase the size of the executable image, thus increase the time it takes to load it into memory. Upside - get rid of initialization-by-code-operations (every time you enter the function in which the variable is declared), thus improve runtime performance. But that's a very general "tendency". You'll have to be more specific as to the variables which you are thinking of moving from global to local (or vice-versa). — barak manos, May 23 '14 at 17:16
There is no definitive answer here, there are pros and cons for both from a performance perspective. So it comes down to personal taste or something application specific. — old_timer, May 23 '14 at 19:24
@artless noise: I meant, if you have (for example) some `int arr[] = {...}`, then: 1. If it's a global array, then the values are simply part of the executable image (at a constant address). 2. If it's a local array in some function, then the values are copied into it every time the function is called. — barak manos, May 23 '14 at 20:49
@barakmanos I see. I think the OP meant either using `arr[num];` directly, versus `foo(int arr[]) { ... arr[num]; }` passed as a pointer/reference. Ie, in both cases, they are not local, only the way they are accessed changed. The initialization is the same if you have written C startup code. `memset()` for BSS and `memcpy()` for init data. — artless noise, May 23 '14 at 20:55
barak already answered that question quite elegantly I might add as far as the OP's high level open ended question (which should have already been closed as being primarily opinion based). If the poster wants to ask more detailed questions about specific use cases, with code examples of the two possibilities, that is an SO question. — old_timer, May 24 '14 at 00:09
@dwelch it's pretty clear from the answers that this is NOT and opinion based issue. It would be if I didn't specify the platform, but I did, and hence it becomes a question to which it's possible to give a specific answer... — b20000, May 24 '14 at 01:49
There are definite pros and cons from a performance perspective. Globals are both good and bad, locals are both good and bad, passing by reference vs global is both good and bad. You have not specified enough information to do better than that. Spend some more time examining compiler output and measuring the resulting performance to fully appreciate the tradeoffs. — old_timer, May 24 '14 at 02:08
Performance is hardly your biggest problem; maintainability and correctness (in particular thread safety) are a more serious concern given your description of the code. Required reading IMO: [A Pox on Globals](http://www.embedded.com/electronics-blogs/break-points/4025723/A-pox-on-globals). Face it - its just *bad code*. The overwhelming maintainability issues outweigh any minuscule and hardly measurable performance benefit. That's a *micro-optimisation* and a second guessing of the compiler. You'll no doubt get a far greater performance gain simply by applying compiler optimizations. — Clifford, May 24 '14 at 07:52

score 0 · Answer 1 · answered May 23 '14 at 17:13

Well, using global variables does not impact CPU performance directly. Stack allocation is typically a single add or subtract at function entry/exit respectively.

However, the stack is very limited in size. Using dynamic allocation on the heap is typically the solution. In embedded systems, this may be a problem because of how long it may take to allocate or free dynamic memory.

If allocating and freeing from the heap is a problem for your system, global variables may alleviate the problem of allocation/free execution time.

I wouldn't recommend this as your first solution — especially if this application involves threading. It may be difficult to track down which threads/functions are modifying global variables, leading to future headaches. static variables are technically placed in the same location as global variables ("global and static data"), so you may want to consider this option first.

score 0 · Answer 2 · edited Sep 12 '19 at 13:41

You are probably worrying about something that's not really a problem for you however...

From theoretical or nitpicking point of view, accessing global variables require some kind of redirection (like GOT for PIC), thus they are slower to access.

When you are accessing variables in local scope, you are implicitly using local references like your stack pointer or values laying in registers, so accessing them is faster.

For example:

extern int x;

int foo(int a, int b, int c, int d, int e) {
  return x + b + e;
}

compiles to

foo(int, int, int, int, int):
    movw    r3, #:lower16:x
    movt    r3, #:upper16:x
    ldr     r3, [r3, #0]
    adds    r0, r1, r3
    ldr     r3, [sp, #0]
    adds    r0, r0, r3
    bx      lr

You can see accessing b (r1) or e (ldr r3, [sp, #0]) requires less instructions compared to accessing x (movw r3, #:lower16:x; movt r3, #:upper16:x; ldr r0, [r3, #0]).

artless noise · Accepted Answer · 2014-05-23T20:01:43.320

0

This will always decrease performance and increase program size versus static variables. Your question doesn't specifically ask what you are comparing to. I can see various alternatives,

Versus static variables.
Versus parameters passed by value.
Versus values in a passed array or structure pointers.

The ARM blog gives specifics on how to load a constant to an arm register. This step must always be done to get the address of a global variable. The compiler will not know a prior how far away a global is. If you use gcc with -lto or use something like whole-program, then better optimizations can be performed. Basically, these will transform the global to a static.

Here a compiler may keep a register with the address of a global base and then different variables are loaded with an offset; such as ldr rN, [rX, #offset]. That is, if you are lucky.

The design of RISC CPUs, like ARM support a load/store unit which handles all memory accesses. Typically, the load/store instructions are capable of the [register + offset] form. Also, all RISC registers are approximately symmetric. Meaning any register can be used for this offset access. Typically, if you pass a struct or array pointer as a parameter, then it becomes the same thing. Ie, ldr rN, [rX, #offset].

Now, the advantage of the parameter is that eventually, your routines can support multiple arrays or structures by passing different pointers. Also, it gives you the advantage to group common data together which gives cache benefits.

I would argues that globals are detrimental on the ARM. You should only use global pointers, where your code needs a singleton. Or you have some sort of synchronization memory. Ie, globals should only be use for global functionality and not for data.

Passing all of the values via the stack is obviously in-efficient and misses the value of a memory reference or pointer.

edited May 23 '14 at 20:01

answered May 23 '14 at 19:55

artless noise

21,212
6
68
105

i've extended the example you pointed to at http://goo.gl/ByaUcI and it seems that the generated assembly code for a function with 7 parameters is still shorter than if you put those 7 parameters in a struct and pass a pointer to the struct ... of course this doesn't take into account the time it takes to copy the data onto the stack? So I guess while there are a few less instructions there will be more instructions copying data on the stack. – b20000 May 23 '14 at 21:50
The *cache locality* issue is a good general point, but the vast majority of ARM7 devices have no cache. – Clifford May 24 '14 at 20:07
@artlessnoise I would imagine he meant what he said [(ARM7)](http://en.wikipedia.org/wiki/ARM7) - it would be unusual refer to the ARM Arch version in this context. Your history is a bit off - NXP announced the LPC2300/2400 series in 2006, and still produce them in quantity (for existing designs). Similarly Atmel still produce AT91SAM7. You would not want to use one in a new design perhaps but many students have ARM7 dev boards available to them. While there are ARM7 parts with cache, the most successful ARM7TDMI core does not. I did say vast majority, not all, and was referring to volume. – Clifford May 25 '14 at 21:48
@Clifford That is a fair point, but the *cache* comment is not central to what I was saying. `ldr rX, [rY]` needs to have `rY` loaded with the address of a global. A global address is not known at compile time (only link), so the compiler must always use *load constants* to get that address to `rY`. Similarily, multiple globals are not related and so the compiler always need to load the address. – artless noise May 26 '14 at 18:40
@b20000 Yes, the call site will usually be more complex for your `brol()` example. Ie, the `brol()` is shorter, but people calling it will probably have more code. – artless noise Feb 04 '15 at 20:30
Here is [an online version](https://gcc.godbolt.org/z/oAIlEQ) where you can see less code when passing by structures. Passing by-value (many parameters) is obviously bad; maybe this is what *barak* is referring to. – artless noise Sep 11 '19 at 23:31

Clifford · Answer 4 · 2014-05-24T08:22:43.127

Any performance benefit or otherwise would depend entirely on the access pattern and usage, so it is not possible to state in an individual case without seeing the code. The code may be efficient or inefficient regardless of the use of globals.

If by making the data global, you avoid function calls to accessors functions for example, and such accesses are frequent, then avoiding the function call overhead may have a measurable performance advantage. But simply being global in and of itself will not have any advantage - its about the method of access and the number of instructions that generates (or wait states if the memory accesses is slower than the processor - off-chip memory for example - but that applies to any data, global or otherwise).

The use of globals in the manner you describe is usually indicative of poor design and/or developer inexperience, and there are likely to be areas of the code that have a far greater impact on performance than mere locality of data access.

In the end the use of global data to gain some perceived performance advantage is ill-conceived. Performance in most cases should be about achieving required real-time deadlines or data-throughput, not about being as fast as possible; if your processor ends up idling 90% f the time, all you have achieved is more time to do nothing.

I suspect your code-base uses global data more out of poor design or workmanship more any deliberate performance concerns. Encapsulated static data with explicitly in-lined or compiler-optimised access functions is likely have similar performance while being more maintainable and easier to debug - advantages that probably far outweigh the performance issues. Ask yourself whether it will be better to save a millisecond of CPU time or a month of development time, or worse a product recall and loss of customers because your product fails in the field.

Does using global variables increase or decrease performance, in C code compiled for ARM7?

4 Answers4

Linked