CPU: i7-9750 @2.6GHz (with 16G DDR4 Ram); GPU: Nvidia Geforce GTX 1600 TI (6G); OS: Windows 10-64bit
I tried to see how fast the GPU is in doing basic matrix operations compared with CPU, and I basically followed this https://towardsdatascience.com/heres-how-to-use-cupy-to-make-numpy-700x-faster-4b920dda1f56. The following is my super simple code
import numpy as np
import cupy as cp
import time
### Numpy and CPU
s = time.time()
A = np.random.random([10000,10000]); B = np.random.random([10000,10000])
CPU = np.matmul(A,B); CPU *= 5
e = time.time()
print(f'CPU time: {e - s: .2f}')
### CuPy and GPU
s = time.time()
C= cp.random.random([10000,10000]); D = cp.random.random([10000,10000])
GPU = cp.matmul(C,D); GPU *= 5
cp.cuda.Stream.null.synchronize()
# to let the code finish executing on the GPU before calculating the time
e = time.time()
print(f'GPU time: {e - s: .2f}')
Ironically, it shows CPU time: 11.74 GPU time: 12.56
This really confuse me. How could the GPU be even slower than CPU on large matrix operations? Note that I even have not applied parallel computing (I am a beginner and I am not sure whether the system will open it for me or not.) I did have checked similar questions such as Why is my CPU doing matrix operations faster than GPU instead?. But here I am using cupy rather than mxnet (cupy is newer and designed for GPU computing).
Can someone help? I woud really appreciate!