I'm doing matrix multiplication on a GTX1080 GPU using JCuda, version 0.8.0RC with CUDA 8.0. I load two matrices A and B into the device in row-major vector form, and read the product matrix from the device. But I'm finding that I run out of device memory earlier than I would expect. For example, if matrix A is dimensioned 100000 * 5000 = 500 million entries = 2GB worth of float values, then:
cuMemAlloc(MatrixA, 100000 * 5000 * Sizeof.FLOAT);
works fine. But if I increase the number or rows to 110000 from 100000, I get the following error on this call (which is made before the memory allocations for matrices B and C, so those are not part of the problem):
Exception in thread "main" jcuda.CudaException: CUDA_ERROR_OUT_OF_MEMORY
at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:344)
at jcuda.driver.JCudaDriver.cuMemAlloc(JCudaDriver.java:3714)
at JCudaMatrixMultiply.main(JCudaMatrixMultiply.java:84) (my code)
The issue is that allocating a matrix of this size on the device should take only about 2.2GB, and the GTX1080 has 8GB of memory, so I don't see why I'm running out of memory. Does anyone have any thoughts on this? It's true that I'm using the JCuda 0.8.0RC with the release version of CUDA 8, but I tried downloading the RC version of CUDA 8 (8.0.27) to use with JCuda 0.8.0RC and had some problems getting it to work. If versions compatibility is likely to be the issue, however, I can try again.
Matrices of 100000 * 5000 are pretty big, of course, and I won't need to work with larger matrices for a while on my neural network project, but I would like to be confident that I can use all 8GB of memory on this new card. Thanks for any help.