Allocation of large pinned memory chunk using CUDA with Java

Question

I'm implementing GPU calculation in a program already written in Java. For that purpose I'm using jcuda bindings. I need a fast host to device memory transfer of, sometimes, relatively large arrays. If I want to use streams, I have to use pinned memory. The problem is if I want to allocate host pinned memory larger than cca 600 Mbs of RAM, I get "CUDA_ERROR_OUT_OF_MEMORY" exception. This is the code I used to test size of the available pinned memory:

    public static void main(String[] args) {
    //Init GPU
    JCudaDriver.setExceptionsEnabled(true);

    // Initialize the device and create device context
    cuInit(0);
    CUdevice device = new CUdevice();
    cuDeviceGet(device, 0);
    CUcontext context = new CUcontext();
    cuCtxCreate(context, 0, device);

    Pointer p = new Pointer();

    int Kb = 1024;
    int Mb = 1024 * Kb;
    int Gb = 1024 * Mb;
    int sequenceSize = 172*Mb; // times 4 for float
    float[] expecteds = new float[sequenceSize];
    float[] actuals = new float[sequenceSize];
    Arrays.fill(expecteds, 3.33f);
    int i = 0;
    try {
        JCudaDriver.cuMemAllocHost(p, sequenceSize* Sizeof.FLOAT);
        FloatBuffer fb = p.getByteBuffer(0, sequenceSize* Sizeof.FLOAT).
                order(ByteOrder.nativeOrder()).
                asFloatBuffer();

        fb.position(0);
        fb.put(expecteds);
        fb.position(0);
        fb.get(actuals);
        JCudaDriver.cuMemFreeHost(p);

    } catch (Exception e) {
        e.printStackTrace();
        JCudaDriver.cuMemFreeHost(p);
    }

}

Now, I'm aware that OS can prevent me to use too much pinned memory since it's non-pageable. The thing is that I have 48Gb (45Gb free) of physical memory and I need a way of forcing OS to give me more of it. Is there a way to do this (elegantly if possible)?

EDIT: OS is 64-bit Windows 7 Professional SP1

Are you sure you are using main memory and not memory on the device? — Peter Lawrey, Sep 12 '12 at 11:04
Well, [cuMemAllocHost()](http://developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/docs/online/group__CUDA__MEM_gdd8311286d2c2691605362c689bc64e0.html) function is used to allocate host memory, and in example I gave I don't touch device memory. As for OS, I'm currently on 64-bit Windows 7 Professional SP1 — djole_djole, Sep 12 '12 at 12:02
I recommend you allocate the memory by other means (say, a thin wrapper around VirtualAlloc()), then use cuMemHostRegister()/cuMemHostUnregister() to make it available to CUDA. — ArchaeaSoftware, Sep 13 '12 at 00:40
There's a known bug in NVIDIA drivers that makes it fail if it can't use memory addresses below somewhere around 2 GB, so try to limit Java's heap to maybe 1 GB with something like `java -Xmx1G`. — Samuel Audet, Sep 18 '12 at 08:02
@ArchaeaSoftware I've tried with VirtualAlloc() and cudaHostRegister() in C++, but I still can't register more than 686 Mbs. code: `void * p = VirtualAlloc( NULL, size, MEM_COMMIT|MEM_RESERVE, PAGE_READWRITE );` `cudaHostRegister(p, size, 0);` — djole_djole, Sep 18 '12 at 10:50

score 1 · Answer 1 · answered Sep 13 '12 at 19:51

Check that you are running Java in 64-bit mode. The FAQ suggests the default is 32-bit, even with the 64-bit downloads. The linked FAQ also tells you how to run in 64-bit mode, you'll need to use the 64-bit DLLs etc. too.

@ArchaeSoftware's suggestion of using cuMemHostRegister()/cuMemHostUnregister() to pin smaller sections of the memory is a sensible alternative.

score 0 · Answer 2 · answered Feb 22 '13 at 10:05

It seems an old page but without Answer.. I guess you are not utilizing your RAM properly as by default Java does not allocate much memory for heap by itself. You can force JVM to use minimum and maximum memory by -Xms and -Xmx respectively and as you are working on 64-bit architechture use "-d64" to after "-Xms" or "-Xmx"

Allocation of large pinned memory chunk using CUDA with Java

2 Answers2