1

I am converting a program from a regular c program to a cuda program and wanted to implement an easy wrapper for malloc that just uses a large pool.

I have 5000 threads. My block size is 1024.

Here is the buffer structure I am using to keep track of each threads memory pool.

typedef struct buffer_t
{
    unsigned long size;
    char* current_index;
    char pool[];
} buffer_t;

As you can imagine I use:

cudaMalloc(&memptr, 262144*5000);

to do the allocation where each thread is suppose to create a buffer on its 262144 bytes

Here are the functions I am using to do the allocations:

__device__ buffer_t* buffer_constructor(size_t size, void* memptr)
{
    buffer_t* buffer = (buffer_t*)memptr;
    buffer->size = size - sizeof(unsigned long) - sizeof(char*);
    buffer->current_index = buffer->pool;
    return buffer;
}
__device__ void* buffer_malloc(buffer_t* buffer, size_t size)
{
    if(size > buffer->size - (buffer->current_index - buffer->pool))
    {
        return NULL;
    }

    void* ptr = buffer->current_index;
    buffer->current_index += size;
    return ptr;
}

Each thread calls:

buffer_t* buffer = buffer_constructor(size, memptr+(tid * size));

So when I run the code it just returns from the kernel at some point. When I run the debugger I get this error:

Program received signal CUDA_EXCEPTION_6, Warp Misaligned Address.
[Switching focus to CUDA kernel 0, grid 1, block (2,0,0), thread (768,0,0), device 0, sm 10, warp 24, lane 0]
0x0000000000b48428 in device_matrix_list_constructor (buffer=<optimized   out>, num=<optimized out>)
    at device_matrix_list.cu:8
8               return list;

When I run memcheck I get a couple of these errors for a couple blocks:

Invalid __global__ write of size 8
=========     at 0x00000258 in    /home/crafton.b/cuda_nn/device_matrix_list.cu:7:device_matrix_list_constructor(buffer_t*, unsigned int)
=========     by thread (897,0,0) in block (4,0,0)
=========     Address 0x235202a0fc is misaligned

Any help is really appreciated I have been struggling with this for a while now

Brian Crafton
  • 438
  • 2
  • 6
  • 15
  • 2
    The error is pretty explicit. CUDA requires that memory pointers are aligned to the word size of the type which any given thread will access. You haven't shown the actual kernel code, so it is impossible to say *how* you got this wrong, but you got it wrong. – talonmies Apr 04 '16 at 17:18
  • 1
    You may wish to read [this section](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses) of the programming guide, in particular note the requirement for naturally aligned access. – Robert Crovella Apr 04 '16 at 17:25

0 Answers0