Questions tagged [pycuda]

PyCUDA is the Python module which provides a comprehensive pythonic interface to the NVIDIA CUDA GPU computing environment.

PyCUDA provides a python module to access the NVIDIA CUDA driver API from within Python code.

The module includes interoperability with numpy, and comprehensive metaprogramming facilities for dynamically generating and JIT compiling CUDA code using Python.

417 questions
3
votes
1 answer

nvcc fatal : '--ptxas-options=-v': expected a number

Getting the nvcc fatal : '--ptxas-options=-v': expected a number error when I try to build a Windows port of Faster-RCNN. You may reach the setup file (which is a Python script) directly from here. Software Environment: - CUDA v10.1 - VS 2019 -…
talha06
  • 6,206
  • 21
  • 92
  • 147
3
votes
2 answers

How to use Python to run pycuda in multiple processes

I have a pycuda code that can run in a single process. Can python's multiple processes support running this code in multiple subprocesses? If I try, I will find that I made a mistake. Did I make a mistake? I tried to use python's process to…
李培鑫
  • 61
  • 4
3
votes
1 answer

How can I create a PyCUDA GPUArray from a gpu memory address?

I'm working with PyTorch and want to do some arithmetic on Tensor data with the help of PyCUDA. I can get a memory address of a cuda tensor t via t.data_ptr(). Can I somehow use this address and my knowledge of the size and data type to initialize a…
oarfish
  • 4,116
  • 4
  • 37
  • 66
3
votes
1 answer

Pycuda compilation error stderr message unreadable

My system is as follows: System Environment: Windows 7 Professional anaconda 3 python 3.5.4 GPU: Quadr K2200 driver: 353.90 CUDA toolkit: 7.5 Visual studio: Visual studio community 2013 (Japanese version) pycuda binary file that I used for…
Kanmani
  • 479
  • 7
  • 21
3
votes
1 answer

CUDA 9.0 and pycuda, error:CompileError: nvcc compilation ... kernel.cu failed

import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy a = numpy.random.randn(4,4) a = a.astype(numpy.float32) a_gpu = cuda.mem_alloc(a.nbytes) cuda.memcpy_htod(a_gpu, a) mod =…
billinair
  • 93
  • 1
  • 11
3
votes
1 answer

Time taken to copy matrix to host increases by how many times the matrix was used

I am benchmarking GPU matrix multiplication using PyCUDA, CUDAMat, and Numba and ran into some behavior I can't find a way to explain. I calculate the time it takes for 3 different steps independently - sending the 2 matrices to device memory,…
Frobot
  • 1,224
  • 3
  • 16
  • 33
3
votes
0 answers

pyCUDA can't print result

Recently, I use pip to install the pyCUDA for my python3.4.3. But I found when I test the sample code(https://documen.tician.de/pycuda/tutorial.html#getting-started), it can't print the result without any error message,the program can end. I can't…
3
votes
1 answer

PyCUDA kernel function

I am new to PyCUDA and was going through some of the examples on the PyCUDA website. I am trying to figure out the logic behind certain lines of code and would really appreciate if someone explained the idea behind it. The below code snippet is from…
user2808264
  • 459
  • 2
  • 11
  • 24
3
votes
1 answer

sorting a numpy matrix on gpu

I have a large matrix - 1045506 x 3 which I want to sort based on the 1st column. Since, it's a numpy matrix, I can use argsort to get the result mat_sorted = mat[mat[:,0].argsort()] It takes about 69ms to complete this step which seems a little to…
shashydhar
  • 801
  • 3
  • 8
  • 26
3
votes
1 answer

Using a shared variable in a function

Hi I'm following a neural net tutorial where the author seems to be using shared variables everywhere. From my understanding, a shared variable in theanos simply is a space in memory that can be shared by the gpu and cpu heap. Anyway, I have two…
Dr.Knowitall
  • 10,080
  • 23
  • 82
  • 133
3
votes
5 answers

Installing Theano with GPU on Windows 8.1 64-bit with Visual Studio 2013

This Theano Installation is making me mad :( So, I've followed the instructions here on the most voted answer because it seemed like the most similar condiguration from mine and up-to-date version : Installing theano on Windows 8 with GPU enabled 1-…
orangejaipur
  • 51
  • 1
  • 5
3
votes
1 answer

CUDA program gives cudaErrorIllegalAddress on sm_35 Kepler GPUs, but runs on fine on other GPUs

I'm having a very weird problem with my program. Essentially I'm doing a matrix multiplication on part of a matrix. The program apparently runs fine on most cards cards but crashes on sm_35 Kepler (=GK110) cards. The initial program was written in…
Untom
  • 85
  • 8
3
votes
1 answer

Unrolling a trivially parallelizable for loop in python with CUDA

I have a for loop in python that I want to unroll onto a GPU. I imagine there has to be a simple solution but I haven't found one yet. Our function loops over elements in a numpy array and does some math storing the result in another numpy array.…
deltap
  • 4,176
  • 7
  • 26
  • 35
3
votes
1 answer

from pycuda.compiler import SourceModule

using Python33 on Windows 8.1 with Cuda toolkit 5.5 and hardware installed when trying to import and initialize the device with: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule <--- this line causes…
Smithma01
  • 31
  • 1
3
votes
1 answer

PyCUDA precision of matrix multiplication code

I am trying to learn CUDA and using PyCUDA to write a simple matrix multiplication code. For two 4x4 randomly generated matrices I get the following solution: Cuda: [[ -5170.86181641 -21146.49609375 20690.02929688 -35413.9296875 ] [-18998.5 …
0b1100001
  • 239
  • 3
  • 6