How to resolve "cudaSuccess = err (0 vs. 8)" error on Paddle v0.8.0b?

Question

I have installed paddlepaddle using the .deb file from https://github.com/baidu/Paddle/releases/download/V0.8.0b1/paddle-gpu-0.8.0b1-Linux.deb

I have CUDA 8.0 installed with cudnn v5.1 without the NVIDIA Accelerated Graphics Driver on a machine with 4 GTX 1080:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

I've set the shell variables:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

All the cuda is working fine, since I have ran all the NVIDIA_CUDA-8.0_Samples and they "PASSED" all tests.

The quick_start demo code in Paddle/demo/quick_start also runs smoothly and did not throw an error.

But when I tried to run the image_classification demo from the Paddle github repo, I am getting a invalid device function error. Is there some way to resolve this?

hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function

The full traceback:

~/Paddle/demo/image_classification$ bash train.sh 
I1005 14:34:51.929863 10461 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model 
I1005 14:34:56.705898 10461 Util.cpp:126] Calling runInitFunctions
I1005 14:34:56.706171 10461 Util.cpp:139] Call runInitFunctions done.
[INFO 2016-10-05 14:34:56,918 layers.py:1620] channels=3 size=3072
[INFO 2016-10-05 14:34:56,919 layers.py:1620] output size for __conv_0__ is 32 
[INFO 2016-10-05 14:34:56,920 layers.py:1620] channels=64 size=65536
[INFO 2016-10-05 14:34:56,920 layers.py:1620] output size for __conv_1__ is 32 
[INFO 2016-10-05 14:34:56,922 layers.py:1681] output size for __pool_0__ is 16*16 
[INFO 2016-10-05 14:34:56,923 layers.py:1620] channels=64 size=16384
[INFO 2016-10-05 14:34:56,923 layers.py:1620] output size for __conv_2__ is 16 
[INFO 2016-10-05 14:34:56,924 layers.py:1620] channels=128 size=32768
[INFO 2016-10-05 14:34:56,925 layers.py:1620] output size for __conv_3__ is 16 
[INFO 2016-10-05 14:34:56,926 layers.py:1681] output size for __pool_1__ is 8*8 
[INFO 2016-10-05 14:34:56,927 layers.py:1620] channels=128 size=8192
[INFO 2016-10-05 14:34:56,927 layers.py:1620] output size for __conv_4__ is 8 
[INFO 2016-10-05 14:34:56,928 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 14:34:56,929 layers.py:1620] output size for __conv_5__ is 8 
[INFO 2016-10-05 14:34:56,930 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 14:34:56,930 layers.py:1620] output size for __conv_6__ is 8 
[INFO 2016-10-05 14:34:56,932 layers.py:1681] output size for __pool_2__ is 4*4 
[INFO 2016-10-05 14:34:56,932 layers.py:1620] channels=256 size=4096
[INFO 2016-10-05 14:34:56,933 layers.py:1620] output size for __conv_7__ is 4 
[INFO 2016-10-05 14:34:56,934 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 14:34:56,934 layers.py:1620] output size for __conv_8__ is 4 
[INFO 2016-10-05 14:34:56,936 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 14:34:56,936 layers.py:1620] output size for __conv_9__ is 4 
[INFO 2016-10-05 14:34:56,938 layers.py:1681] output size for __pool_3__ is 2*2 
[INFO 2016-10-05 14:34:56,938 layers.py:1681] output size for __pool_4__ is 1*1 
[INFO 2016-10-05 14:34:56,941 networks.py:1125] The input order is [image, label]
[INFO 2016-10-05 14:34:56,941 networks.py:1132] The output order is [__cost_0__]
I1005 14:34:56.948256 10461 Trainer.cpp:170] trainer mode: Normal
F1005 14:34:56.949136 10461 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7fa557316daa  (unknown)
    @     0x7fa557316ce4  (unknown)
    @     0x7fa5573166e6  (unknown)
    @     0x7fa557319687  (unknown)
    @           0x78a939  hl_gpu_apply_unary_op<>()
    @           0x7536bf  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x7532a9  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x73d82f  paddle::BaseMatrixT<>::zero()
    @           0x66d2ae  paddle::Parameter::enableType()
    @           0x669acc  paddle::parameterInitNN()
    @           0x66bd13  paddle::NeuralNetwork::init()
    @           0x679ed3  paddle::GradientMachine::create()
    @           0x6a6355  paddle::TrainerInternal::init()
    @           0x6a2697  paddle::Trainer::init()
    @           0x53a1f5  main
    @     0x7fa556522f45  (unknown)
    @           0x545ae5  (unknown)
    @              (nil)  (unknown)
/home/xxx/Paddle/binary/bin/paddle: line 81: 10461 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!

According to issue #158 of the git repo , this issue should be resolved in #170 and supports GTX 1080 with CUDA 8.0 but it's still throwing errors when accessing GPU functions. (sorry can't add more than 2 links with low reputation)

Does anyone know how to resolve this and install it such that the image_classification can run?

I have also tried compiling + installing from source and the same error is thrown while the quick_start demo runs smoothly.

Well you have two basic options: learn C++, start picking through the source code, find the bug, and fix it. Option number two is to file a bug report, and wait. — Sam Varshavchik, Oct 04 '16 at 10:49
I don't think this is a C++ bug. the `hl_create_global_resources()` suggests that it's a cuda related thingy. — nat gillin, Oct 04 '16 at 11:15
I can trivially hack up some code that will crash somewhere inside `libstdc++`. It won't be a library bug. Welcome to C++. — Sam Varshavchik, Oct 04 '16 at 11:37
Do you have a CUDA GPU in that machine? Why do you say "without the NVIDIA Accelerated Graphics Driver" ? You intentionally did not install the driver? A proper CUDA install includes a verification step. Did you verify that CUDA is installed correctly by building and running some sample codes? — Robert Crovella, Oct 04 '16 at 15:30
Yes i have 4 GPUs on this machine. Without nvidia accelerated graphics because it's incompatiable with the xserver configurations on ubuntu. Moreover my machine with CUDA 7.5 works with tensorflow and the sample code on 8.0 from CUDA also runs. Just that the `paddle-paddle` binaries don't seem to work. — nat gillin, Oct 05 '16 at 00:34

score 1 · Answer 1 · answered Oct 05 '16 at 11:06

I know nothing about paddle. However, the CUDA error is almost certainly being caused by the binary you have installed not containing code for your (rather new) GTX1080. Either find a version with support for Pascal GPUs, or build your own version from source.

score 1 · Accepted Answer · answered Oct 10 '16 at 07:30

1

The issue is because of the flags set for the architecture in the Paddle/cmake/flags.cmake for CUDA 8.0.

It has been solved in https://github.com/baidu/Paddle/pull/165/files by adding the compute_52, sm_52 and compute_60 and sm_60

answered Oct 10 '16 at 07:30

alvas

115,346
109
446
738

How to resolve "cudaSuccess = err (0 vs. 8)" error on Paddle v0.8.0b?

2 Answers2