3

I'm doing my PhD research in A.I. and I've gotten to the part where I have to start using CUDA libraries for my testing platform. I've played with CUDA before, and I have a basic understanding of how GPGPU works, etc, but I am troubled by the float precision.

Looking at GTX680 I see FP64: 1/24 FP32, whereas Tesla has full FP64 at 1.31 TFLOPS. I understand very well that one is a gaming card, while the other is a professional card.

The reason I am asking is simple: I cannot afford a Tesla, but I may be able to get two GTX680. While the main target is to have as many CUDA cores and memory, float precision may become a problem.

My questions are:

  1. How much of a compromise is the small float precision in Gaming GPU's?
  2. Isn't 1/24 of a 32bit float precision too small? Especially compared to previous Fermi of 1/8 FP32
  3. Is there a risk of wrong computation results due to the smaller float precision? I.e in SVM, VSM, Matrix operations, Deep Belief Networks, etc, could I have issues with the results of the algorithms due to the smaller floating point, or does it simply mean that operations will take longer/use more memory?

Thanks !

Ælex
  • 14,432
  • 20
  • 88
  • 129
  • 4
    These opinion-soliciting questions are in general not a good fit for stackoverflow. Before your question gets closed let me state my opinion: If you can afford two GTX 680, you can also afford a GTX Titan where you get native FP64 speed (1/3 FP32 just as on Tesla). That saves you the pain of multi-GPU programming (unless that is what you want to learn). And it even comes close to the FP32 speed of two GTX 680 and has the other goodies of compute capability 3.5 like up to 255 registers per thread. – tera Apr 16 '13 at 01:54
  • @tera Thanks, that makes much more sense. I was looking at the 1/3 F32 of titan after I posted. And no, I don't want to get into multi-GPU programming, just importing cuda libraries. – Ælex Apr 16 '13 at 01:59

1 Answers1

9

These are very subjective questions.

It's not entirely clear that you understand the difference between C or C++ float and double datatypes. FP32 vs. FP64 refers to float and double in C or C++. The numbers of 1/8 and 1/24 that you refer to are not affecting precision but they are affecting throughput. All of the GPUs you mention have some FP64 double-precision capability, so the differences don't come down to capability so much as performance.

It's very important for you to understand whether the codes you care about depend on double-precision floating point or not. It's not enough to say things like "matrix operations" to understand whether FP32 (float) or FP64 (double) matters.

If your codes depend on FP64 double, then those performance ratios (1/8, 1/24, etc.) will be relevant. But your codes should still run, perhaps more slowly.

You're also using some terms in a fashion that may lead to confusion. Tesla refers to the NVIDIA GPGPU family of compute products. It would be better to refer to a specific member of the Tesla family. Since you mention 1.31 TFlops FP, you are referring to Tesla K20X. Note that K20X also has a ratio between FP64 throughput and FP32 throughput (i.e. it can be even faster than 1.31 TFlops on FP32 codes).

If your algorithms depend on double they will still run on any of the products you mention, and the accuracy of the results should be the same regardless of the product, however the performance will be lower, depnding on the product. If your algorithms depend on float, they will run faster on any given product than double, assuming floating point throughput is the limiting factor.

You may also want to consider the GeForce GTX Titan. It has double-precision floating point performance that is roughly on par with Tesla K20/K20x.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you, you just verified what I was starting to understand. The lower FP in GTX family affects the rate at which double precision is processed, correct? Also, yes, I am using Sparse Matrices of doubles, and that is the primary reason I am concerned about float precision. From both your answer and the comment above, it seems like GTX Titan may be the best compromise between the two. – Ælex Apr 16 '13 at 02:16
  • 1
    Yes, for most members of the GeForce family, the double-precision throughput is significantly lower than various members of the Tesla family. GTX Titan is the exception. Since the principal target of GeForce is consumer graphics and gaming, which do not depend on FP64 at all, the lower FP64 throughput there does not matter. K10 on the Tesla side is also an exception in the other direction, as it has relatively low FP64 throughput. – Robert Crovella Apr 16 '13 at 02:21
  • 1
    Depending on the nature of the sparse matrix processing, the code may becomes bound by memory throughput before it become bound by DP throughput, even with the lower DP throughput of a gaming GPU. It depends on the ratio of FLOPS / bytes. – njuffa Apr 16 '13 at 04:56
  • @njuffa Are you referring to the device memory? – Ælex Apr 16 '13 at 13:37
  • 1
    Yes, the memory on the graphics card. I should have probably be clearer and said that sparse matrix code may be limited by the throughput of global memory, rather than the throughput of the floating-point units in the GPU. – njuffa Apr 17 '13 at 09:31
  • Exactly what do those throughput/performance ratios (1/8, 1/24) measure? – wip Nov 20 '13 at 03:37
  • the ratio of double precision compute hardware to single precision compute hardware. – Robert Crovella Nov 20 '13 at 07:03