Consider the following code:
typedef struct {
uint32_t a : 1;
uint32_t b : 1;
uint32_t c : 30;
} Foo1;
void Fun1(void) {
volatile Foo1 foo = (Foo1) {.a = 1, .b = 1, .c = 1};
}
This general pattern comes up quite a bit when using bit fields to punch registers in embedded applications. Using a recent ARM gcc compiler (e.g. gcc 8.2 or gcc 7.3) with -O3
and -std=c11
, I get the following assembly:
sub sp, sp, #8
movs r3, #7
str r3, [sp, #4]
add sp, sp, #8
bx lr
This is pretty much exactly what you want and expect; Foo
is not volatile, so the initialization of each bit can be combined together into the literal 0x7
before finally being stored to the volatile variable (register) foo
.
However, it's convenient to be able to manipulate the raw contents of the entire register which gives rise to an anonymous implementation of the bit field:
typedef union {
struct {
uint32_t a : 1;
uint32_t b : 1;
uint32_t c : 30;
};
uint32_t raw;
} Foo2;
void Fun2(void) {
volatile Foo2 foo = (Foo2) {.a = 1, .b = 1, .c = 1};
}
Unfortunately, the resulting assembly is not so optimized:
sub sp, sp, #8
ldr r3, [sp, #4]
orr r3, r3, #1
str r3, [sp, #4]
ldr r3, [sp, #4]
orr r3, r3, #2
str r3, [sp, #4]
ldr r3, [sp, #4]
and r3, r3, #7
orr r3, r3, #4
str r3, [sp, #4]
add sp, sp, #8
bx lr
For a densely packed register, a read-modify-write of each bit can get... expensive.
What's special about the union / anonymous struct that prevents gcc from optimizing the initialization like the pure struct?