I am trying to create an ldm
(resp. stm
) instruction with inline assembly but have problems to express the operands (especially: their order).
A trivial
void *ptr;
unsigned int a;
unsigned int b;
__asm__("ldm %0!,{%1,%2}" : "+&r"(ptr), "=r"(a), "=r"(b));
does not work because it might put a
into r1
and b
into r0
:
ldm ip!, {r1, r0}
ldm
expects registers in ascending order (as they are encoded in a bitfield) so I need a way to say that the register used for a
is lower than this of b
.
A trivial way is the fixed assignment of registers:
register unsigned int a asm("r0");
register unsigned int b asm("r1");
__asm__("ldm %0!,{%1,%2}" : "+&r"(ptr), "=r"(a), "=r"(b));
But this removes a lot of flexibility and might make the generated code not optimal.
Does gcc (4.8) support special constraints for ldm/stm
? Or, are there better ways to solve this (e.g. some __builtin
function)?
EDIT:
Because there are recommendations to use "higher level" constructs... The problem I want to solve is packing of 20 bits of a 32 bit word (e.g. input is 8 words, output is 5 words). Pseudo code is
asm("ldm %[in]!,{ %[a],%[b],%[c],%[d] }" ...)
asm("ldm %[in]!,{ %[e],%[f],%[g],%[h] }" ...) /* splitting of ldm generates better code;
gcc gets out of registers else */
/* do some arithmetic on a - h */
asm volatile("stm %[out]!,{ %[a],%[b],%[c],%[d],%[e] }" ...)
Speed matters here and ldm
is 50% faster than ldr
. The arithmetic is tricky and because gcc
generates much better code than me ;) I would like to solve it in inline assembly with giving some hints about optimized memory access.