I'm working Power9 and utilizing the hardware random number generator instruction called DARN. I have the following inline assembly:
uint64_t val;
__asm__ __volatile__ (
"xor 3,3,3 \n" // r3 = 0
"addi 4,3,-1 \n" // r4 = -1, failure
"1: \n"
".byte 0xe6, 0x05, 0x61, 0x7c \n" // r3 = darn 3, 1
"cmpd 3,4 \n" // r3 == -1?
"beq 1b \n" // retry on failure
"mr %0,3 \n" // val = r3
: "=g" (val) : : "r3", "r4", "cc"
);
I had to add a mr %0,3
with "=g" (val)
because I could not get GCC to produce expected code with "=r3" (val)
. Also see Error: matching constraint not valid in output operand.
A disassembly shows:
(gdb) b darn.cpp : 36
(gdb) r v
...
Breakpoint 1, DARN::GenerateBlock (this=<optimized out>,
output=0x7fffffffd990 "\b", size=0x100) at darn.cpp:77
77 DARN64(output+i*8);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.ppc64le libgcc-4.8.5-28.el7_5.1.ppc64le libstdc++-4.8.5-28.el7_5.1.ppc64le
(gdb) disass
Dump of assembler code for function DARN::GenerateBlock(unsigned char*, unsigned long):
...
0x00000000102442b0 <+48>: addi r10,r8,-8
0x00000000102442b4 <+52>: rldicl r10,r10,61,3
0x00000000102442b8 <+56>: addi r10,r10,1
0x00000000102442bc <+60>: mtctr r10
=> 0x00000000102442c0 <+64>: xor r3,r3,r3
0x00000000102442c4 <+68>: addi r4,r3,-1
0x00000000102442c8 <+72>: darn r3,1
0x00000000102442cc <+76>: cmpd r3,r4
0x00000000102442d0 <+80>: beq 0x102442c8 <DARN::GenerateBlock(unsigned char*, unsigned long)+72>
0x00000000102442d4 <+84>: mr r10,r3
0x00000000102442d8 <+88>: stdu r10,8(r9)
Notice GCC faithfully reproduces the:
0x00000000102442d4 <+84>: mr r10,r3
0x00000000102442d8 <+88>: stdu r10,8(r9)
How do I get GCC to fold the two instructions into:
0x00000000102442d8 <+84>: stdu r3,8(r9)