Using NTL and GMP with CUDA

Question

I have a C++ program which uses the NTL and GMP mathematical libraries for multiplication and modular arithmetic using arbitrary length integers and polynomials. I currently use the terminal instruction below to compile it in line with the suggestion in the NTL documentation:

g++ -g -O2 -std=c++11 -pthread -march=native directory/filename.cpp -o directory/filename.out -lntl -lgmp -lm

Having modified it (successfully I think) to run on a GPU, I would now like to compile the program to include the same libraries as that in the C++ version. My CUDA program includes the same headers in the preamble as the C++ program.

I am very new to CUDA so my question really is whether anybody here knows if I can/would need to use the same flags as above in the standard nvcc compilation instructions, or, more generally, how I could go about compiling my CUDA program to include the NTL and GMP libraries?

nvcc directory/filename.cu -o directory/filename.out

If it's not possible, are there alternative CUDA libraries that would be the closest equivalent to either of those or suit applications using arbitrary length integer and polynomial arithmetic?

Thank you kindly in advance, any help is much appreciated.

This is generally not possible to use a library written for CPUs to be used in a GPU kernel. At least, not efficiently, as the GPU works very differently from a CPU (starting from the execution model up to the actual instructions and the binary code). AFAIK, GMP does not support CUDA. Actually, this is the kind of operation that should not be very efficient on a GPU as most arithmetic algorithm are sequential. It may be faster than a CPU if you are dealing with a LOT of (relatively small) numbers. — Jérôme Richard, Aug 24 '21 at 19:32
its not clear from the question whether the request is to link against GMP for host code usage or link against GMP for device code usage. — Robert Crovella, Aug 24 '21 at 22:02
@JérômeRichard I have an embarrassingly parallel algorithm which I am trying to execute in parallel, not the arithmetic operations themselves, which as you say I can run quite effectively on a CPU. The problem is that I have lots of these arithmetic operations to do, which suits the throughput of a GPU I believe. I'm basically wanting to do multiple separate polynomial/integer multiplications concurrently, e.g. for two separate multiplications, u*v and x*y, each is done on a different thread, so two separate threads or cores for this example. Does that make sense? — render3D, Aug 25 '21 at 10:23
@RobertCrovella Apologies, I'd like to use it with NTL on the device. The multiplication operations can be done sequentially on a thread, I'd just like to try to execute multiple multiplications concurrently/in parallel. Hope that's more clear now. — render3D, Aug 25 '21 at 10:27
The thing is GPUs are only good when the computation can (mainly) fit to the [SIMT model](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#hardware-implementation) (as opposed to CPUs). Being embarrassingly parallel is not enough (and not really required either). Here, the computation is not SIMT-friendly at all as the numbers may have different size and chunks will probably be stored non-contiguously (not to mention possible load-balancing issues with the multiplication). So it may be possible to do that but I do not think it can be much faster than on your CPU. — Jérôme Richard, Aug 25 '21 at 17:59

Using NTL and GMP with CUDA

0 Answers0