I'm wondering whether there is any compiler (gcc
, xlc
, etc.) on Power8 that supports OpenMP SIMD constructs on Power8? I tried with XL (13.1) but I couldn't compile successfully. Probably it doesn't support simd construct yet.
I could compile with gcc 4.9.1 (with these flags -fopenmp -fopenmp-simd
and -O1
). I put differences between 2 asm files.
Can I say that gcc 4.9 is able to generate altivec code? In order to optimize more, what am I supposed to do? (I tried with -O3
, restrict treatment)
My code is very simple:
int *x, *y, *z;
x = (int*) malloc(n * sizeof(int));
y = (int*) malloc(n * sizeof(int));
z = (int*) malloc(n * sizeof(int));
#pragma omp simd
for(i = 0; i < N; ++i)
z[i] = a * x[i] + y[i];
And generated assembly is here
.L7:
lwz 9,124(31)
extsw 9,9
std 9,104(31)
lfd 0,104(31)
stfd 0,104(31)
ld 8,104(31)
sldi 9,8,2
ld 10,152(31)
add 9,10,9
lwz 10,124(31)
extsw 10,10
std 10,104(31)
lfd 0,104(31)
stfd 0,104(31)
ld 7,104(31)
sldi 10,7,2
ld 8,136(31)
add 10,8,10
lwz 10,0(10)
extsw 10,10
lwz 8,132(31)
mullw 10,8,10
extsw 8,10
lwz 10,124(31)
extsw 10,10
std 10,104(31)
lfd 0,104(31)
stfd 0,104(31)
ld 7,104(31)
sldi 10,7,2
ld 7,144(31)
add 10,7,10
lwz 10,0(10)
extsw 10,10
add 10,8,10
extsw 10,10
stw 10,0(9)
lwz 9,124(31)
addi 9,9,1
stw 9,124(31)
GCC with -O1 -fopenmp-simd
.L7:
lwz 9,108(31)
mtvsrwa 0,9
mfvsrd 8,0
sldi 9,8,2
ld 10,136(31)
add 9,10,9
lwz 10,108(31)
mtvsrwa 0,10
mfvsrd 7,0
sldi 10,7,2
ld 8,120(31)
add 10,8,10
lwz 10,0(10)
extsw 10,10
lwz 8,116(31)
mullw 10,8,10
extsw 8,10
lwz 10,108(31)
mtvsrwa 0,10
mfvsrd 7,0
sldi 10,7,2
ld 7,128(31)
add 10,7,10
lwz 10,0(10)
extsw 10,10
add 10,8,10
extsw 10,10
stw 10,0(9)
lwz 9,108(31)
addi 9,9,1
stw 9,108(31)
In order to clarify and understand details, I have one more application which is n^2 nbody application. This time my question is related with these compilers (gcc 4.9 and XL 13.1 ) and architectures (Intel and Power).
I put all the codes into gist https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-input-c ( full version of input code input.c )
- Power8 & XLC - It says "was not SIMD vectorized because it contains function calls. (there is sqrtf)". It's reasonable. But in the asm code I can see xsnmsubmdp is it normal? (the assembly: https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-power8-xlc-noinnersimd-asm)
- Power8 & gcc I tried to compile it in 2 ways (with omp simd construct and without). It changed my asm code, is it normal? (According to OpenMP, the code should not contain function call) (Assembilies: https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-power8-gcc-noinnersimd-asm & https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-power8-gcc-innersimd-asm)
- i74820K & gcc I did a same test with omp simd and without it. The output codes are different as well. Does FMA effect this code block ? (Assembilies: https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-i74820k-gcc-noinnersimd-asm & https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-i74820k-gcc-innersimd-asm)
Thanks in advance