I am trying to use CuPy to accelerate python functions that are currently mostly using NumPy. I have installed CuPy on the Jetson AGX Xavier with CUDA 10.0 installed.
The CuPy functions seem to be working fine, however, they are a lot slower than their NumPy counterparts. For example, I ran the first example from here with devastating results:
import numpy as np
import cupy as cp
import time
### Numpy and CPU
s = time.time()
x_cpu = np.ones((1000,1000,1000))
e = time.time()
print(e - s) # output: 0.9008722305297852
### CuPy and GPU
s = time.time()
x_gpu = cp.ones((1000,1000,1000))
cp.cuda.Stream.null.synchronize()
e = time.time()
print(e - s) # output: 4.973184823989868
I also ran other functions (e.g. np./cp.nonzero
), but they gave similar or worse results. How is this possible?
I want to do image processing (ca. size 2500x2000 greyscale/mono images) for a lane detection algorithm and cannot really use the cuda functions from OpenCV for this, since the only part in my code that is implemented in their library is cv2.cuda.warpPerspective()
(and it would likely not make too much sense to upload/download the image to the GPU only for this). Where do I go from this? Use numba? (-> probably not a good fit, since (the compute-intensive parts of) my algorithm mostly consists of numpy function calls) Implement the whole thing in C++? (-> I doubt my C++ code would be faster than the optimized NumPy functions)
Sidenote: CuPy was installed using pip3 install cupy
because the recommended pip3 install cupy-cuda100
failed with the output:
ERROR: Could not find a version that satisfies the requirement cupy-cuda100
ERROR: No matching distribution found for cupy-cuda100