-1

This is my first question ;-)

I try to use AVX in CUDA application (ccminer) but nvcc shows an error:

/usr/local/cuda/bin/nvcc -Xcompiler "-Wall -mavx" -O3 -I . -Xptxas "-abi=no -v" -gencode=arch=compute_50,code=\"sm_50,compute_50\" --maxrregcount=80 --ptxas-options=-v -I./compat/jansson -o x11/x11.o -c x11/x11.cu
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined

[...]

This is just the first error. There are many 'undefined' builtin functions :-(

Everything is ok for 'C/C++' programs - with .c or .cpp extensions. But .cu - error :-( What do I do wrong ? I can compile ccminer but I cannot add AVX intrinsics to .cu files - only .c files. I use Intel intrinsics not gcc.

Any help greatly appreciated. Thanks in advance.

Linux Mint (ubuntu 13) 64bit, gcc 4.8.1, cuda 6.5.

I do not expect AVX to work on GPU. In .cu file there is small portion CPU based code which I want to vectorize.

Here is example to reproduce the error. I took the simplest example from: http://computer-graphics.se/hello-world-for-cuda.html

Added line at the beginning:

#include <immintrin.h>

and tried to compile with the command:

nvcc cudahello.cu -Xcompiler -mavx

got an error:

/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h(118): error: identifier "__builtin_ia32_addpd256" is undefined

The same code without #include <immintrin.h> compiles without problems.

Here is whole code:

#include <stdio.h>
#if defined(__AVX__)
#include <immintrin.h>
#endif

const int N = 16; 
const int blocksize = 16; 

__global__ 
void hello(char *a, int *b) 
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
    char a[N] = "Hello \0\0\0\0\0\0";
    int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    char *ad;
    int *bd;
    const int csize = N*sizeof(char);
    const int isize = N*sizeof(int);

    printf("%s", a);

    cudaMalloc( (void**)&ad, csize ); 
    cudaMalloc( (void**)&bd, isize ); 
    cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); 
    cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); 

    dim3 dimBlock( blocksize, 1 );
    dim3 dimGrid( 1, 1 );
    hello<<<dimGrid, dimBlock>>>(ad, bd);
    cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); 
    cudaFree( ad );
    cudaFree( bd );

    printf("%s\n", a);
    return EXIT_SUCCESS;
}

Compile with

nvcc cudahello.cu -Xcompiler -mavx

to get the error or with

nvcc cudahello.cu

to compile clean.

Marcin Badtke
  • 599
  • 5
  • 9
  • 3
    How do you expect Intel AVX instructions to work on an nVidia GPU ? – Paul R Oct 10 '14 at 14:20
  • 2
    I see you have now edited the question to clarify a little - you need to either move the CPU-specific code out of the .cu file into a .c/.cpp file, or conditionally compile it for CPU-only. – Paul R Oct 10 '14 at 14:51
  • 1
    @PaulR that is the usual advice. Do you want to provide an answer (I would upvote). – Robert Crovella Oct 10 '14 at 15:13
  • Thanks. I am just a beginner in CUDA/vectorisation world. So the general rule is to put only GPU part of code in .cu files while CPU based in .c/.cpp files. Correct ? – Marcin Badtke Oct 10 '14 at 15:22
  • @baton: yes, that's more or less correct. – Paul R Oct 10 '14 at 15:26
  • @RobertCrovella: my CUDA experience is a bit out of date so if you'd like to supply an answer then please go ahead (and I'll up-vote of course!) - I'm sure it will be more comprehensive than me just converting my comment to an answer. – Paul R Oct 10 '14 at 15:29
  • 1
    I voted to close instead because OP did not provide an [MCVE](http://stackoverflow.com/help/mcve) which SO expects for this type of question ("why is my code not working"). If OP wants to provide a new question with an MCVE that I can compile and verify, then I'll provide an answer along those lines. It's possible that there is some other issue with OP's code. Without seeing an example, I don't want to sweep it under the rug. – Robert Crovella Oct 11 '14 at 18:35
  • added code to reproduce the error – Marcin Badtke Oct 12 '14 at 06:48

2 Answers2

0

I think I have an answer. Functions like:

_builtin_ia32_addpd256

are built into gcc and nvcc does not know about them. Since they are declared in immintrin.h nvcc returns errors while compiling .cu file with immintrin.h included. So we cannot mix cuda features with builtin gcc functions in one file.

Marcin Badtke
  • 599
  • 5
  • 9
  • Yep. I just came to a similar conclusion. in my case `/usr/local/cuda-6.5/bin/nvcc -fmad=true -arch=sm_20 -ftz=true -prec-div=false -prec-sqrt=false -c -ccbin g++ -Xcompiler -mavx,-m64,-O3,-pipe,-D_CUDA,,,,"-I/usr/include/mpich2 -I/usr/include/mpich2" -std=c++11 --compiler-options -fno-strict-aliasing -O3 -use_fast_math -m64 -I/usr/local/cuda-6.5/include -I.. cuda/cuda.cu -o cuda.o` gives that same error. (note the –mavx). The error is not shown in CUDA 7.0. – user9869932 Nov 03 '15 at 02:20
0

This issue was actually fixed with CUDA 8, with the nvcc version shipping with CUDA 8 I can compile code that contains AVX intrinsics (which I couldn't with older versions).

szellmann
  • 21
  • 1