Questions tagged [pycuda]

PyCUDA is the Python module which provides a comprehensive pythonic interface to the NVIDIA CUDA GPU computing environment.

PyCUDA provides a python module to access the NVIDIA CUDA driver API from within Python code.

The module includes interoperability with numpy, and comprehensive metaprogramming facilities for dynamically generating and JIT compiling CUDA code using Python.

417 questions
2
votes
2 answers

Generating single random number in pyCuda kernel

I have seen many ways to generate an array of random numbers. but I want to generate a single random number. Is there any function as rand() in c++. I don't want a series of random numbers. I just need to generate a random number inside the kernel.…
Saddam
  • 23
  • 1
  • 5
2
votes
1 answer

_driver.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN5boost6detail12set_tss_dataEPKvPFvPFvPvES3_ES5_S3_b

I tried to run the Nvidia TensoRT's python samples, but got an error importing pycuda: ImportError: .../pycuda-2020.1-py3.6-linux-x86_64.egg/pycuda/_driver.cpython-36m-x86_64-linux-gnu.so: undefined symbol:…
Alex Fu
  • 113
  • 1
  • 8
2
votes
1 answer

Python multiprocessing with TensorRT

I am trying to use a TensorRT engine for inference in a python class that inherits from multiprocessing. The engine works in a standalone python script on my system, but now while integrating it into the codebase, the multiprocessing used in the…
2
votes
1 answer

PyCUDA: GPUArray.get() returns inaccessible array

I am trying to sum up an array in the GPU and then obtain it back on the host. For this, I am using the pycuda.gpuarray.sum() function. import pycuda.gpuarray a = np.array([1,2,3,4,5]) b = gpuarray.to_gpu(a) c = gpuarray.sum(b) c = c.get() print(c) …
m0bi5
  • 8,900
  • 7
  • 33
  • 44
2
votes
2 answers

CUDA/PyCUDA: Which GPU is running X11?

In a Linux system with multiple GPUs, how can you determine which GPU is running X11 and which is completely free to run CUDA kernels? In a system that has a low powered GPU to run X11 and a higher powered GPU to run kernels, this can be determined…
dwelch91
  • 399
  • 4
  • 11
2
votes
1 answer

Can't install pycuda with pip

I am trying to install the PyCUDA module to run some python script I downloaded, but trying to install it with pip doesn't work. I run pip install pycuda on the command line At first, I get this: Collecting pycuda Using cached…
Julio974
  • 77
  • 1
  • 2
  • 7
2
votes
1 answer

TensorRT multiple Threads

I am trying to use TensorRt using the python API. I am trying to use it in multiple threads where the Cuda context is used with all the threads (everything works fine in a single thread). I am using docker with tensorrt:20.06-py3 image, and an onnx…
Walid Hanafy
  • 1,429
  • 2
  • 14
  • 26
2
votes
1 answer

Can int variables be transferred from host to device in PyCUDA?

import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy as np dims=img_in.shape rows=dims[0] columns=dims[1] channels=dims[2] #To be used in CUDA Device …
2
votes
2 answers

PyCUDA GPUArray slice-based operations

The PyCUDA documentation is a bit light on examples for those of us in the 'Non-Guru' class, but I'm wondering about the operations available for array operations on gpuarrays, ie. if I wanted to gpuarray this…
Bolster
  • 7,460
  • 13
  • 61
  • 96
2
votes
2 answers

pycuda shared memory up to device hard limit

This is an extension of the discussion here: pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value" Is there a method in pycuda that is equivalent to the following C++ API call? #define SHARED_SIZE 0x18000 // 96…
dag
  • 192
  • 1
  • 7
2
votes
1 answer

Using CUDA types in pyCUDA

Let us consider the CUDA code at CUDA's Mersenne Twister for an arbitrary number of threads and suppose that I want to convert it to a pyCUDA application. I know that I can use ctypes and CDLL, namely, cudart =…
Vitality
  • 20,705
  • 4
  • 108
  • 146
2
votes
0 answers

What is the source code for Keras function model.fit()?

I'm building a simple neural network in Python using Tensorflow and Keras. I need to implement this code to work on a GPU, using PyCuda. I plan on parallelizing learning all the epochs, however since Keras is very minimalistic, all epoch training…
jonny
  • 21
  • 2
2
votes
2 answers

pycuda.driver module not found

I have installed python 3.7.2 along with the following libraries: jupyter, pandas, numpy, pytools and pycuda. I'm working with Visual Studio Code. I'm trying to run the standard pyCuda example: # --- PyCuda initialization import pycuda.driver as…
Vitality
  • 20,705
  • 4
  • 108
  • 146
2
votes
1 answer

How to profile PyCuda code in Linux?

I have a simple (tested) pycuda app and am trying to profile it. I've tried NVidia's Compute Visual Profiler, which runs the program 11 times, then emits this error: NV_Warning: Ignoring the invalid profiler config option:…
Jeff Guy
  • 157
  • 1
  • 9
2
votes
1 answer

PyCuda mem_alloc initialization error

in desaturate_image redarray_gpu = cuda.mem_alloc(self.redarray.nbytes) pycuda._driver.LogicError: cuMemAlloc failed: initialization error I get the above error on this line: redarray_gpu = cuda.mem_alloc(self.redarray.nbytes) What could be…
Sachin Titus
  • 1,960
  • 3
  • 23
  • 41