1

I'm getting this error every time I try to run the application though it compile well:

pool allocator: Specified pool size too big for this device

Current file: /home/marco/Desktop/tools.c function: PTC3D line: 330

This file was compiled: -ta=tesla:cc35,cc50,cc60,cc70,cc70,cc75,cc80

The strange thing is that I get this error only since I restarted the PC, while before I've never get it.

I compile with:

CC = nvc

CFLAGS = -c -acc -ta=tesla:managed:cuda11.0 -Minfo=accel -w -O3 -DTEST_CASE=3

LDFLAGS = -lm -acc -ta=tesla:managed:cuda11.0

In the code nothing has been changed so maybe it is related to the compiler. I installed a new program today and I could have touched something I shouldn't have.

Steve
  • 89
  • 1
  • 6

2 Answers2

2

The message should just be a warning. The pool allocator will be by-passed and instead the CUDA Unified Memory API routines will be called directly for each allocation. You might see some performance degradation if you have a lot of small allocations since the API calls have a relatively high overhead, but shouldn't hurt functionality.

The default CUDA Unified Memory pool size is 1GB, though this is modifiable by setting the environment variable NVCOMPILER_ACC_POOL_SIZE. You might try setting the size to something smaller to see if it fixes the messages. Full details can be found at: https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#acc-mem-unified

Exactly why the message starting appearing is unclear, but it's most likely hardware related, or possibly a CUDA driver issue. What device and CUDA driver are you using? Has anything changed with hardware?

Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Thank you, I try what you suggest. I'm using CUDA 11.0. I'm not able to say if something has changed, the only thing I've done was installing mpich, I don't see why CUDA would be affected. – Steve Oct 02 '20 at 19:32
  • Sorry for the naive question. How can I set the enviroment variable? – Steve Oct 02 '20 at 19:43
  • Depends on your shell. In bash: "export NVCOMPILER_ACC_POOL_SIZE=100MB", in csh: "setenv NVCOMPILER_ACC_POOL_SIZE 100MB" – Mat Colgrove Oct 02 '20 at 20:21
  • Even with 100 mb the error is the same, and also with NVCOMPILER_ACC_POOL_SIZE=0 – Steve Oct 02 '20 at 21:10
  • Odd. Do you still see the same behavior if you disable the pool allocator altogether? i.e. "export NVCOMPILER_ACC_POOL_ALLOC=0" – Mat Colgrove Oct 02 '20 at 23:49
  • there's no more the message "pool allocator: Specified pool size too big for this device" but still it doesn't run properly and it prints: "Current file: /home/marco/Desktop/tools.c function: PTC3D line: 330 This file was compiled: -ta=tesla:cc35,cc50,cc60,cc70,cc70,cc75,cc80" – Steve Oct 03 '20 at 07:00
  • I think I found the problem: typing nvcaccelinfo I get "No accelerators found", but I'm not finding how to solve this. – Steve Oct 03 '20 at 08:11
  • 1
    What does the 'nvidia-smi' command say? Are you able to run a simple CUDA C program? Assuming you're CUDA driver is installed properly, then the "No accelerators found" message means that the runtime can't find the driver runtime library "libcuda.so". You might try finding the location for libcuda.so (thought it typically is put in /usr/lib64 or /usr/local/lib64) and setting your LD_LIBRARY_PATH to include this directory. – Mat Colgrove Oct 05 '20 at 15:06
0

I solved going in Software & Updates, in Additional Drivers, setting the recommended driver: NVIDIA driver meta package.

Steve
  • 89
  • 1
  • 6