1

I could build all the examples of the ArrayFire project (except the CUDA ones, having an AMD APU). However, just the ones running on the CPU work correctly; the GPU based ones have issues.

Example:

benchmarks> ls
CMakeFiles  blas_cpu  cg_cpu  cmake_install.cmake  fft_cpu  fft_opencl  Makefile  pi_cpu

This is the CPU version:

benchmarks> ./fft_cpu 
ArrayFire v3.8.0 (CPU, 64-bit Linux, build d99887a)
[0] AMD: AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx  Benchmark N-by-N 2D fft
 128 x  128:     3 Gflops
 256 x  256:     4 Gflops
 512 x  512:     4 Gflops
1024 x 1024:     4 Gflops
2048 x 2048:     5 Gflops
4096 x 4096:     5 Gflops

This is a run with the GPU version (running in verbose mode):

benchmarks> AF_PRINT_ERRORS=1 AF_JIT_KERNEL_TRACE=stdout AF_TRACE=all ./fft_opencl  
[platform][1626887645][022941] [ ../src/backend/common/DependencyModule.cpp:99 ] Attempting to load: libforge.so
[platform][1626887645][022941] [ ../src/backend/common/DependencyModule.cpp:102 ] Found: libforge.so
[platform][1626887645][022941] [ ../src/backend/opencl/device_manager.cpp:218 ] Found 1 OpenCL platforms
[platform][1626887645][022941] [ ../src/backend/opencl/device_manager.cpp:230 ] Found 1 devices on platform Clover
[platform][1626887645][022941] [ ../src/backend/opencl/device_manager.cpp:235 ] Found device AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0) on platform
Clover
[platform][1626887645][022941] [ ../src/backend/opencl/device_manager.cpp:240 ] Found 1 OpenCL devices
[platform][1626887646][022941] [ ../src/backend/opencl/device_manager.cpp:335 ] Default device: 0
ArrayFire v3.8.0 (OpenCL, 64-bit Linux, build d99887a)
[0] Clover: AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0), 3072 MB
Benchmark N-by-N 2D fft
128 x  128: [mem][1626887646][022941] [ ../src/backend/opencl/memory.cpp:200 ] nativeAlloc: 64 KB 0x56337cfc7e50
[jit][1626887646][022941] [ ../src/backend/opencl/compile_module.cpp:254 ] {9348653917523335434  : loaded from /home/pietrom/.arrayfire/KER9348653917523335434_CL_4098_AMD_RADEON(TM)_VEGA_3_
GRAPHICS_(RAVEN,_DRM_3.40.0,_5.12.11-ZEN1-1-ZEN,_LLVM_12.0.0)_AF_38.bin for AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0) }

Using the GPU, the execution is stuck after the last message printed above.

This is another run with the GPU version:

benchmarks> AF_PRINT_ERRORS=1 AF_JIT_KERNEL_TRACE=stdout AF_TRACE=all ./fft_opencl
[platform][1627145464][006841] [ ../src/backend/common/DependencyModule.cpp:99 ] Attempting to load: libforge.so
[platform][1627145464][006841] [ ../src/backend/common/DependencyModule.cpp:102 ] Found: libforge.so
[platform][1627145464][006841] [ ../src/backend/opencl/device_manager.cpp:218 ] Found 1 OpenCL platforms
[platform][1627145464][006841] [ ../src/backend/opencl/device_manager.cpp:230 ] Found 1 devices on platform Clover
[platform][1627145464][006841] [ ../src/backend/opencl/device_manager.cpp:235 ] Found device AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0) on platform Clover
[platform][1627145464][006841] [ ../src/backend/opencl/device_manager.cpp:240 ] Found 1 OpenCL devices
Invalid MIT-MAGIC-COOKIE-1 keyERROR: GLFW wasn't able to initalize
[platform][1627145464][006841] [ ../src/backend/opencl/device_manager.cpp:335 ] Default device: 0
ArrayFire v3.8.0 (OpenCL, 64-bit Linux, build d99887a)
[0] Clover: AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0), 3072 MB
Benchmark N-by-N 2D fft
 128 x  128: [mem][1627145464][006841] [ ../src/backend/opencl/memory.cpp:200 ] nativeAlloc: 64 KB 0x56039668df20
[jit][1627145464][006841] [ ../src/backend/opencl/compile_module.cpp:254 ] {9348653917523335434  : loaded from /home/pietrom/.arrayfire/KER9348653917523335434_CL_4098_AMD_RADEON(TM)_VEGA_3_GRAPHICS_(RAVEN,_DRM_3.40.0,_5.12.11-ZEN1-1-ZEN,_LLVM_12.0.0)_AF_38.bin for AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0) }

In the last run, the Invalid MIT-MAGIC-COOKIE-1 keyERROR: GLFW wasn't able to initalize error message is produced.

In some runs, the system crashes completely, sometimes with some graphic artifacts before a black screen.

Is this a common issue? Could have I done something wrong? Could be anything missing on my system?


Here is the stacktrace:

arrayfire_tests_benchmarks> gdb -q ./fft_opencl 
Reading symbols from ./fft_opencl...
(No debugging symbols found in ./fft_opencl)
(gdb) run
Starting program: /home/pietrom/myProgs/test/arrayfire_tests_benchmarks/fft_opencl 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffe0e43640 (LWP 5316)]
[New Thread 0x7fffdbfff640 (LWP 5317)]
[New Thread 0x7fffdb7fe640 (LWP 5318)]
[New Thread 0x7fffdaffd640 (LWP 5319)]
[New Thread 0x7fffda7fc640 (LWP 5320)]
[New Thread 0x7fffd9ffb640 (LWP 5321)]
[New Thread 0x7fffd97fa640 (LWP 5322)]
[New Thread 0x7fffd8ff9640 (LWP 5323)]
[New Thread 0x7fffc3fff640 (LWP 5324)]
[New Thread 0x7fffc37fe640 (LWP 5325)]
[New Thread 0x7fffc2ffd640 (LWP 5326)]
[New Thread 0x7fffc27fc640 (LWP 5327)]
[New Thread 0x7fffc1ffb640 (LWP 5328)]
[New Thread 0x7fffc17fa640 (LWP 5329)]
[New Thread 0x7fffc0ff9640 (LWP 5330)]
[New Thread 0x7fff9ffff640 (LWP 5331)]
[New Thread 0x7fff9f7fe640 (LWP 5332)]
[New Thread 0x7fff9effd640 (LWP 5333)]
[New Thread 0x7fff9e7fc640 (LWP 5334)]
[New Thread 0x7fff9dffb640 (LWP 5335)]
[New Thread 0x7fff9d7fa640 (LWP 5336)]
[New Thread 0x7fff9cff9640 (LWP 5337)]
[New Thread 0x7fff83fff640 (LWP 5338)]
[New Thread 0x7fff837fe640 (LWP 5339)]
[New Thread 0x7fff82ffd640 (LWP 5340)]
[New Thread 0x7fff827fc640 (LWP 5341)]
[New Thread 0x7fff81ffb640 (LWP 5342)]
[New Thread 0x7fff817fa640 (LWP 5343)]
[New Thread 0x7fff80ff9640 (LWP 5344)]
[New Thread 0x7fff5ffff640 (LWP 5345)]
[New Thread 0x7fff5f7fe640 (LWP 5346)]
[New Thread 0x7fff5effd640 (LWP 5347)]
[New Thread 0x7fff5e7fc640 (LWP 5348)]
[New Thread 0x7fff5dffb640 (LWP 5349)]
[New Thread 0x7fff5d7fa640 (LWP 5350)]
[Thread 0x7fff5f7fe640 (LWP 5346) exited]
[Thread 0x7fff80ff9640 (LWP 5344) exited]
[Thread 0x7fff817fa640 (LWP 5343) exited]
[Thread 0x7fff81ffb640 (LWP 5342) exited]
[Thread 0x7fff827fc640 (LWP 5341) exited]
[Thread 0x7fff82ffd640 (LWP 5340) exited]
[Thread 0x7fff5ffff640 (LWP 5345) exited]
[Thread 0x7fff837fe640 (LWP 5339) exited]
[Thread 0x7fff9d7fa640 (LWP 5336) exited]
[Thread 0x7fff9dffb640 (LWP 5335) exited]
[Thread 0x7fff9e7fc640 (LWP 5334) exited]
[Thread 0x7fff9cff9640 (LWP 5337) exited]
[Thread 0x7fff9effd640 (LWP 5333) exited]
[Thread 0x7fff9f7fe640 (LWP 5332) exited]
[Thread 0x7fff9ffff640 (LWP 5331) exited]
[Thread 0x7fff83fff640 (LWP 5338) exited]
[Thread 0x7fff5d7fa640 (LWP 5350) exited]
[Thread 0x7fff5dffb640 (LWP 5349) exited]
[Thread 0x7fff5e7fc640 (LWP 5348) exited]
[Thread 0x7fff5effd640 (LWP 5347) exited]
Invalid MIT-MAGIC-COOKIE-1 keyERROR: GLFW wasn't able to initalize
ArrayFire v3.8.0 (OpenCL, 64-bit Linux, build d99887a)
[0] Clover: AMD Radeon(TM) Vega 3 Graphics (RAVEN, DRM 3.40.0, 5.12.11-zen1-1-zen, LLVM 12.0.0), 3072 MB
Benchmark N-by-N 2D fft
^C--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "fft_opencl" received signal SIGINT, Interrupt.
0x00007ffff519ae6b in ioctl () from /usr/lib/libc.so.6
(gdb) backtrace
#0  0x00007ffff519ae6b in ioctl () from /usr/lib/libc.so.6
#1  0x00007fffea1edb69 in drmIoctl () from /usr/lib/libdrm.so.2
#2  0x00007fffe1561348 in amdgpu_cs_query_fence_status () from /usr/lib/libdrm_amdgpu.so.1
#3  0x00007fffe181836e in ?? () from /usr/lib/gallium-pipe/pipe_radeonsi.so
#4  0x00007fffe17f0fc5 in ?? () from /usr/lib/gallium-pipe/pipe_radeonsi.so
#5  0x00007fffed6c10d9 in ?? () from /usr/lib/libMesaOpenCL.so.1
#6  0x00007fffed6a54e3 in ?? () from /usr/lib/libMesaOpenCL.so.1
#7  0x00007ffff68d506e in cl::CommandQueue::finish (this=<optimized out>) at include/CL/../CL/cl2.hpp:8117
#8  opencl::sync (device=0) at ../src/backend/opencl/platform.cpp:465
#9  0x00007ffff70aadaf in af_sync (device=device@entry=-1) at ../src/api/c/device.cpp:207
#10 0x00007ffff73c2552 in af::sync (device=device@entry=-1) at ../src/api/cpp/device.cpp:101
#11 0x00007ffff74007c8 in af::timeit (fn=0x555555555259 <fn()>) at ../src/api/cpp/timing.cpp:83
#12 0x00005555555553c3 in main ()
(gdb) 
Pietro
  • 12,086
  • 26
  • 100
  • 193
  • What I am missing here ? I don't think the whole error message is copied here. The message following the `128x128` text seems to be from JIT and it is about some memory allocation as part of calling fft routine. Can you please show the whole error message. – pradeep Jul 22 '21 at 03:03
  • @pradeep - I just double checked, the message is complete. After the [jit] message there is nothing else because the process likely crashes/makes no progress. – Pietro Jul 22 '21 at 15:44
  • Can you please run the program with gdb and collect the stack trace and share it here in the description or as a file link. Hopefully that will reveal some clue. – pradeep Jul 23 '21 at 15:00
  • @pradeep - Here it is. Thank you for your help. – Pietro Jul 24 '21 at 17:21
  • The code is failing when `clEnqueueFinish` is called, could mean the kernel encountered error but in such cases usually system doesn't crash, just a error is reported which we report back to the ArrayFire user. At the moment, I think it could be driver issue. But I can't confirm that without testing this myself on the same device you have. Did you happen to run any tests in your build folder using the command `ctest -R opencl` I am curious as to if any tests pass at all. My APU is very old, like 7 years, so I don't think it matters if I can't reproduce it on that one. – pradeep Jul 26 '21 at 03:57
  • `ctest -R opencl --> Test project /home/.../arrayfire_tests_benchmarks No tests were found!!!` But maybe I am running the `ctest` command from the wrong directory... – Pietro Jul 27 '21 at 11:26
  • From the `clinfo` command, OpenCL is working correctly (OpenCL 1.1 Mesa 21.1.5). – Pietro Jul 27 '21 at 11:43
  • `ctest` command should be run from the build folder i.e. where your CMake project files are there. For example, if you ran `cmake ..` from `/somepath` then, ctest command should be run from the `/somepath`. Successful `clinfo` run doesn't necessarily indicate bug-less opencl driver from the specific vendor. It just means the device properties are correctly populated. Unfortunately, without any stack trace or error, it is virtually impossible to understand where the program is failing, especially given that you say all OpenCL examples on that device are failing. – pradeep Jul 28 '21 at 14:02
  • Running `ctest` from the source directory (where CMakeLists.txt and the fft.cpp files are), I still get: `Test project /opt/arrayfire/share/ArrayFire/examples/benchmarks No tests were found!!!` (the exclamation marks are part of the message, not mine...) – Pietro Aug 03 '21 at 09:08
  • You should run it from your build folder not source folder(where CMakeLists.txt is there). – pradeep Aug 04 '21 at 10:04
  • No matter where I run the `ctest` command from (I tried all subdirectories), I always get the same error message: `No tests were found!!!`, and the following directory is created: `Testing/Temporary`, containing these two files: `CTestCostData.txt, LastTest.log`; the first contains the string `---`; the second one contains the start testing and end testing timestamps. – Pietro Aug 04 '21 at 16:28
  • Perhaps tests weren't enabled at all. What is the value of `BUILD_TESTING` in your `CMakeCache.txt` file under build folder ? – pradeep Aug 06 '21 at 03:52
  • @pradeep - The `BUILD_TESTING` string is neither in my `CMakeCache.txt` file, nor in any other file in the project. – Pietro Aug 10 '21 at 09:04
  • Sorry, but I am unable to fathom what's transpiring in your build. At this point, I think you should create Dockerfile with all the instructions you are using to compile ArrayFire so that we can easily replicate your issue. My best guess your cmake configure command didn't run properly. – pradeep Aug 11 '21 at 10:04

0 Answers0