-3

I'm trying to configure CUDA6.0 with my Visual Studio 2010. I created a project using CUDA 6.0 runtime. When compiling, do I need to change/add any options? Since my CUDA code runs slower than the series version, is there any chance that the code is not executed in parallel (say, the GPU becomes a slow CPU...), regardless other possibilities e.g. double precision, overhead, etc.?

Many thanks, XF

The Hiary
  • 109
  • 1
  • 13
  • 1
    The speed of a parallel code (and, in particular, how faster it is as compared to its sequential version) is a combination of your programming skills and the parallelizability of the algorithm. So, it can happen that a parallel code runs slower than its sequential counterpart. Most probably, there is nothing wrong with the use you are doing of CUDA 6.0, in terms of compiler options. Finally, there is no possibility that the GPU becomes a slow CPU. – Vitality Jul 19 '14 at 18:53

1 Answers1

2

From what it sounds like is that you just took some serial code and compiled it thinking it would work.

But with the assumption that you actually have parallel code you might want to make sure you

  1. Use the architecture your card has. Under the properties -> CUDA C/C++ -> Device -> Code Generation make sure you have the correct value. For my card I have compute_35,sm_35. If your card supports Maxwell you can do compute_50,sm_50.
  2. You can change your optimization under the **-> CUDA C/C++ -> Optimization **
  3. Make sure you are not compiling with debug on.
  4. If all these fail you should use the NSIGHT Analysis Tool (Or the visual profiler) on your application to see where you might have some issues. Check to make sure you don't have bank conflicts if you are using shared memory, reduce divergence, etc. The visual profiler is pretty good about telling you what is wrong.

You should also check out the GTC talks on optimizations [link to pdf] (my old professor). It covers some basic optimizations that you can perform to get your code up to speed.

The talks from the last few years of GTC can be found here [link]. They have multiple updates to optimizations, talks about different tools and so forth.

deathly809
  • 384
  • 4
  • 11
  • 1
    Just FYI, OP is using [cusp](https://github.com/cusplibrary/cusplibrary), just hasn't bothered to explain it here. See [here](https://devtalk.nvidia.com/default/topic/762460/cuda-setup-and-installation/compiling-options-for-vs2010/) and [here](https://devtalk.nvidia.com/default/topic/762462/cuda-programming-and-performance/bad-performance-using-cusp-conjugate-gradient-/). – Robert Crovella Jul 19 '14 at 20:45
  • I saw the tag but I tried to give any information I could with such little info he gave me. Maybe he will give some feedback? (doubtful). – deathly809 Jul 19 '14 at 20:51
  • It's a good answer, I upvoted. I just wanted to give you some background info. The other possibly useful piece of info is that OP is using a GTX760 which is cc3.0 (GK104 based) and therefore not particularly the best option for high double precision throughput. But the sparse matrix activity is often more bandwidth bound than compute bound, so DP vs. SP may not matter much. – Robert Crovella Jul 19 '14 at 20:55
  • Thanks. Usually people just downvote me and don't give me any feedback. It's a tough crowd. – deathly809 Jul 19 '14 at 20:57
  • Thank you guys! I've just tried CUSP CG with single precision and double precision. For a 1000*1000 Poisson problem, DP took some 7s while SP took 6s. Maybe I should pay more attention on data transfer, etc. – Oakleaf Jul 21 '14 at 21:44