1

I'm compiling some OpenMP 4.5 code with the IBM XL C/C++ compiler with the intention of offloading some of its work to a GPU, like so:

xlc++ mycode.cpp -qsmp=omp -qreport -qoffload -std=c++11 -Wall

Compilation seems to be successful, giving me only the following messages:

mycode.cpp:
"mycode.cpp", line 284: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 293: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 309: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 324: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 126: 1586-674 (I) Remark: Simd or nested parallel directive requires OpenMP runtime
"" 1586-671 (I) GPU OpenMP Runtime is required for offloaded kernel '__xl__Z9MyCodeiii_l123_h44039046689_OL_1'

However, when I run the code, I get the following unpleasant message:

1587-169 No valid target devices available.

Using nvidia-smi, I have verified that target devices are, in fact, available:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59                 Driver Version: 384.59                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-SXM2...  Off  | 00000002:01:00.0 Off |                    0 |
| N/A   33C    P0    29W / 300W |     10MiB / 16276MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-SXM2...  Off  | 00000003:01:00.0 Off |                    0 |
| N/A   29C    P0    30W / 300W |     10MiB / 16276MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-SXM2...  Off  | 00000006:01:00.0 Off |                    0 |
| N/A   31C    P0    28W / 300W |     10MiB / 16276MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-SXM2...  Off  | 00000007:01:00.0 Off |                    0 |
| N/A   27C    P0    29W / 300W |     10MiB / 16276MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

My thought is that XL is somehow targeting the wrong accelerator, but I can't find an option to set this.

How can I get my code to recognize and utilize the available GPUs?

Richard
  • 56,349
  • 34
  • 180
  • 251

1 Answers1

1

-qtgtarch specifies GPU architectures where the code may run. Please try -qtgtarch=auto if you would like the compiler to automatically detect the architecture of device 0 of the system on which the compiler is being executed. Alternatively, you might try setting it manually, for example -qtgtarch=sm_60.

More information at Knowledge Center.

Nicole Trudeau
  • 678
  • 3
  • 8
  • 1
    Thanks! That seems to have worked: the `sm_60` was what I needed. I now get the unhelpful error message `1587-163 Error encountered while attempting to execute on the target device 0. The program will stop.` I don't suppose you have thoughts about how to figure out why that's coming up? – Richard Jan 22 '18 at 22:06
  • @Richard Hi Richard. This message could be issued for a variety of reasons, and without seeing some source code it will be hard to say. Would you be able to provide some reduced code? – Nicole Trudeau Jan 24 '18 at 17:09