I am new to ARM assembly, and I want to implement one of my C functions in inline assembly. My functions is multi-precision multiplication which multiplies 32-bit unsigned integer with 256-bit unsigned integer and put the result into 288-bit unsigned integer data type. I defined my data-type as:
typedef struct UN_256fe{
uint32_t uint32[8];
}UN_256fe;
typedef struct UN_288bite{
uint32_t uint32[9];
}UN_288bite;
and here is my function:
void multiply32x256(uint32_t A, UN_256fe* B, UN_288bite* res){
uint32_t temp;
asm ( "umull %0, %1, %9, %10;\n\t"
"umull %18, %2, %9, %11;\n\t"
"adds %1, %18, %1; \n\t"
"umull %18, %3, %9, %12;\n\t"
"adcs %2, %18, %2; \n\t"
"umull %18, %4, %9, %13;\n\t"
"adcs %3, %18, %3; \n\t"
"umull %18, %5, %9, %14;\n\t"
"adcs %4, %18, %4; \n\t"
"umull %18, %6, %9, %15;\n\t"
"adcs %5, %18, %5; \n\t"
"umull %18, %7, %9, %16;\n\t"
"adcs %6, %18, %6; \n\t"
"umull %18, %8, %9, %17;\n\t"
"adcs %7, %18, %7; \n\t"
"adc %8, %8, 0 ; \n\t"
: "=r"(res->uint32[8]), "=r"(res->uint32[7]), "=r"(res->uint32[6]), "=r"(res->uint32[5]), "=r"(res->uint32[4]),
"=r"(res->uint32[3]), "=r"(res->uint32[2]), "=r"(res->uint32[1]), "=r"(res->uint32[0])
: "r"(A), "r"(B->uint32[7]), "r"(B->uint32[6]), "r"(B->uint32[5]),
"r"(B->uint32[4]), "r"(B->uint32[3]), "r"(B->uint32[2]), "r"(B->uint32[1]), "r"(B->uint32[0]), "r"(temp));
}
It seems fine to me. But when I debug my code, for example at first line after performing "umull %0, %1, %9, %10;\n\t"
I have:
(gdb) p/x A //-->%9
$8 = 0x1
(gdb) p/x B->uint32[7] //-->%10
$9 = 0xffffff1
(gdb) p/x res->uint32[8] //-->%0
$10 = 0x1
(gdb) p/x res->uint32[7] //-->%1
$11 = 0x0
It seems that I made some mistakes in my assembly instructions. Can anyone explain it to me?