I have CU file with a single kernel defined in it. The kernel calls a function which in turn calls one of two other. In total, all the functions combined is only around ~600 lines, however, some of those contain long mathematical expressions that are around ~1200 characters long. There are maybe 3 or 4 of these long expressions, 3 or 4 half that size and then a few comparatively short expressions. I am compiling this into a CUBIN file to be loaded at runtime in another program. The resulting CUBIN file is only around 800 kB.
Compiling this code for the host (in plain C) using gcc completes in less than a second. NVCC ends up taking ~20-30 minutes or more!
My command-line looks something like this:
nvcc -cubin -arch=sm_20 -m64 -ccbin g++ foo.cu -o foo.cubin -Xptxas -O0 -Xcompiler -Wno-unused-variable
What could be causing this? Is it possible to make it faster in any way?