This is my first question ;-)
I try to use AVX in CUDA application (ccminer) but nvcc shows an error:
/usr/local/cuda/bin/nvcc -Xcompiler "-Wall -mavx" -O3 -I . -Xptxas "-abi=no -v" -gencode=arch=compute_50,code=\"sm_50,compute_50\" --maxrregcount=80 --ptxas-options=-v -I./compat/jansson -o x11/x11.o -c x11/x11.cu
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined
[...]
This is just the first error. There are many 'undefined' builtin functions :-(
Everything is ok for 'C/C++' programs - with .c or .cpp extensions. But .cu - error :-( What do I do wrong ? I can compile ccminer but I cannot add AVX intrinsics to .cu files - only .c files. I use Intel intrinsics not gcc.
Any help greatly appreciated. Thanks in advance.
Linux Mint (ubuntu 13) 64bit, gcc 4.8.1, cuda 6.5.
I do not expect AVX to work on GPU. In .cu file there is small portion CPU based code which I want to vectorize.
Here is example to reproduce the error. I took the simplest example from: http://computer-graphics.se/hello-world-for-cuda.html
Added line at the beginning:
#include <immintrin.h>
and tried to compile with the command:
nvcc cudahello.cu -Xcompiler -mavx
got an error:
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined
The same code without #include <immintrin.h>
compiles without problems.
Here is whole code:
#include <stdio.h>
#if defined(__AVX__)
#include <immintrin.h>
#endif
const int N = 16;
const int blocksize = 16;
__global__
void hello(char *a, int *b)
{
a[threadIdx.x] += b[threadIdx.x];
}
int main()
{
char a[N] = "Hello \0\0\0\0\0\0";
int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
char *ad;
int *bd;
const int csize = N*sizeof(char);
const int isize = N*sizeof(int);
printf("%s", a);
cudaMalloc( (void**)&ad, csize );
cudaMalloc( (void**)&bd, isize );
cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );
dim3 dimBlock( blocksize, 1 );
dim3 dimGrid( 1, 1 );
hello<<<dimGrid, dimBlock>>>(ad, bd);
cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
cudaFree( ad );
cudaFree( bd );
printf("%s\n", a);
return EXIT_SUCCESS;
}
Compile with
nvcc cudahello.cu -Xcompiler -mavx
to get the error or with
nvcc cudahello.cu
to compile clean.