0

I tried to run following simple cublas example on both console environment and in Django framework.

"""
Demonstrates multiplication of two matrices on the GPU.
"""

import pycuda
import pycuda.gpuarray as gpuarray
import pycuda.driver as drv
import numpy as np

drv.init() #init pycuda driver
current_dev = drv.Device(0) #device we are working on
ctx = current_dev.make_context() #make a working context
ctx.push() #let context make the lead

import scikits.cuda.linalg as culinalg
import scikits.cuda.misc as cumisc
culinalg.init()

# Double precision is only supported by devices with compute
# capability >= 1.3:
import string
demo_types = [np.float32]

for t in demo_types:
    print 'Testing matrix multiplication for type ' + str(np.dtype(t))
    if np.iscomplexobj(t()):
    a = np.asarray(np.random.rand(10, 5)+1j*np.random.rand(10, 5), t)
    b = np.asarray(np.random.rand(5, 5)+1j*np.random.rand(5, 5), t)
    c = np.asarray(np.random.rand(5, 5)+1j*np.random.rand(5, 5), t)
    else:
    a = np.asarray(np.random.rand(10, 5), t)
    b = np.asarray(np.random.rand(5, 5), t)
    c = np.asarray(np.random.rand(5, 5), t)

    a_gpu = gpuarray.to_gpu(a)
    b_gpu = gpuarray.to_gpu(b)
    c_gpu = gpuarray.to_gpu(c)

    temp_gpu = culinalg.dot(a_gpu, b_gpu)
    d_gpu = culinalg.dot(temp_gpu, c_gpu)
    temp_gpu.gpudata.free()
    del(temp_gpu)
    print 'Success status: ', np.allclose(np.dot(np.dot(a, b), c) , d_gpu.get())

    print 'Testing vector multiplication for type '  + str(np.dtype(t))
    if np.iscomplexobj(t()):
    d = np.asarray(np.random.rand(5)+1j*np.random.rand(5), t)
    e = np.asarray(np.random.rand(5)+1j*np.random.rand(5), t)
    else:
    d = np.asarray(np.random.rand(5), t)
    e = np.asarray(np.random.rand(5), t)

    d_gpu = gpuarray.to_gpu(d)
    e_gpu = gpuarray.to_gpu(e)

    temp = culinalg.dot(d_gpu, e_gpu)
    print 'Success status: ', np.allclose(np.dot(d, e), temp)

ctx.pop() #deactivate again
ctx.detach() #delete it

In console environment, I succeeded. But when I wanted to run in django(I plugged the example as a function in get method of URL) , it gave me a segmentation fault(core dump).

Does anyone know what might be the cause of this problem ? The traceback information of cuda-gdb is as following:

0  0x00007ffff782d267 in kill () from /lib/x86_64-linux-gnu/libc.so.6
1  0x000000000041f44e in ?? ()
2  0x000000000052c6d5 in PyEval_EvalFrameEx ()
3  0x000000000052cf32 in PyEval_EvalFrameEx ()
4  0x000000000055c594 in PyEval_EvalCodeEx ()
5  0x000000000052ca8d in PyEval_EvalFrameEx ()
6  0x000000000056d0aa in ?? ()
7  0x000000000052e1e6 in PyEval_EvalFrameEx ()
8  0x000000000056d0aa in ?? ()
9  0x000000000052e1e6 in PyEval_EvalFrameEx ()
10 0x000000000056d0aa in ?? ()
11 0x000000000052e1e6 in PyEval_EvalFrameEx ()
12 0x000000000055c594 in PyEval_EvalCodeEx ()
13 0x000000000052ca8d in PyEval_EvalFrameEx ()
14 0x000000000052cf32 in PyEval_EvalFrameEx ()
15 0x000000000055c594 in PyEval_EvalCodeEx ()
16 0x000000000052ca8d in PyEval_EvalFrameEx ()
17 0x000000000055c594 in PyEval_EvalCodeEx ()
18 0x00000000005b7392 in PyEval_EvalCode ()
19 0x0000000000469663 in ?? ()
20 0x00000000004699e3 in PyRun_FileExFlags ()
21 0x0000000000469f1c in PyRun_SimpleFileExFlags ()
22 0x000000000046ab81 in Py_Main ()

thanks !

talonmies
  • 70,661
  • 34
  • 192
  • 269
Alex Gao
  • 2,073
  • 4
  • 24
  • 27
  • 1
    is pycuda thread safe? – Thomas Jun 30 '14 at 02:52
  • @Thomas I guess not. it might be the cause. I tried on tornado and it worked. tornado is a lighter python webframe which does not always create a thread for get method. – Alex Gao Jun 30 '14 at 07:45
  • Code running on the GPU won't *ever* cause a segmentation fault on the host CPU. The back trace is very clearly in the Python interpreter API calls. I'd start with pure numpy code and check that works without PyCUDA or the CUDA scikit and the progresssively add features back until you find what breaks it. I have removed the CUDA tag from this question, it doesn't really have anything with CUDA programming, per se. – talonmies Jun 30 '14 at 07:46

1 Answers1

1

I created a new process using subprocess to deal with CUDA calculation and that solved problem. The reason might be pycuda is not thread safe.

Alex Gao
  • 2,073
  • 4
  • 24
  • 27