I'm facing a simple problem, where all my calls to cudaMalloc fail, giving me an out of memory error, even if its just a single byte I'm allocating.
The cuda device is available and there is also a lot of memory available (bot checked with the corresponding calls).
Any idea what the problem could be?