0

I'm trying to obtain how much free memory I have on the device. To do this I call the cuda function cuMemGetInfo from a fortran code, but it returns negative values for the free amount of memory, so there's clearly something wrong. Does anyone know how I can do that? Thanks

EDIT:

Sorry, in fact my question was not very clear. I'm using OpenACC in Fortran and I call the C++ cuda function cudaMemGetInfo. Finally I could fix the code, the problem was effectively the kind of variables that I was using. Switching to size_ fixed everything. This is the interface in fortran that I'm using:

interface
subroutine get_dev_mem(total,free) bind(C,name="get_dev_mem")
    use iso_c_binding
        integer(kind=c_size_t)::total,free
end subroutine get_dev_mem
end interface

and this the cuda code

#include <cuda.h>
#include <cuda_runtime.h>

extern "C" {
void get_dev_mem(size_t& total, size_t& free) 
{
    cuMemGetInfo(&free, &total);
}
}

There's one last question: I pushed an array on the gpu and I checked its size using cuMemGetInfo, then I computed it's size counting the number of bytes, but I don't have the same answer, why? In the first case it is 3052mb large, in the latter 3051mb. This difference of 1mb could be the size of the array descriptor? Here there's the code that I used:

integer, parameter:: long = selected_int_kind(12)
integer(kind=c_size_t) :: total, free1,free2
real(8), dimension(:),allocatable::a
integer(kind=long)::N, eight, four

allocate(a(four*N))

!some OpenACC stuff in order to init the gpu
call get_dev_mem(total,free1)

!$acc data copy(a)

call get_dev_mem(total,free2) 
print *,"size a in the gpu = ",(free1-free2)/1024/1024, " mb"
print *,"size a in theory  = ", (eight*four*N)/1024/1024, " mb"

!$acc end data
deallocate(a)
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
rosilho
  • 145
  • 1
  • 2
  • 7
  • 1
    can you show a *simple* code that reproduces the error? Are you doing cuda error checking on the return code from the cuMemGetInfo call? – Robert Crovella Dec 19 '13 at 18:38
  • 1
    You mention fortran but your question is also tagged openacc. Are you using OpenACC (Fortran) or are you using CUDA Fortran ? – Robert Crovella Dec 19 '13 at 18:48
  • 2
    As @RobertCrovella has asked, could we see some code? I see `cuMemGetInfo` expects `size_t`, are you giving it them (using `iso_c_binding` and `C_SIZE_T`)? Or it could be giving you `unsigned int`s back that you are interpreting as signed integers. – Timothy Brown Dec 19 '13 at 19:58
  • Why are people so quick to downvote this question? Sure, it needs improvement, but cut the guy some slack, he/she is new on SO. – einpoklum Dec 20 '13 at 06:44
  • 1
    The reason that the allocation does not match your size calculation is that there is overhead. The overhead is in the form of allocation overhead (since allocations are not usually done in units of bytes, but in larger granular sizes such as kilobytes or higher), as well as general housekeeping. The GPU memory is used by the CUDA driver to store general housekeeping information, just as windows or linux OS use some of system memory for their housekeeping purposes. – Robert Crovella Dec 20 '13 at 23:35

1 Answers1

1

Right, so, like commenters have suggested, we're not sure exactly what you're running, but filling in the missing details by guessing, here's a shot:

Most CUDA API calls return a status code (or error code if you will); this is true both in C/C++ and in Fortran, as we can see in the Portland Group's CUDA Fortran Manual:

Most of the runtime API routines are integer functions that return an error code; they return a value of zero if the call was successful, and a nonzero value if there was an error. To interpret the error codes, refer to “Error Handling,” on page 48.

This is the case for cudaMemGetInfo() specifically:

integer function cudaMemGetInfo( free, total )
    integer(kind=cuda_count_kind) :: free, total

The two integers for free and total are cuda_count_kind, which if I am not mistaken are effectively unsigned... anyway, I would guess that what you're getting is an error code. Have a look at the Error Handling section on page 48 of the manual.

einpoklum
  • 118,144
  • 57
  • 340
  • 684