Questions tagged [pycuda]

PyCUDA is the Python module which provides a comprehensive pythonic interface to the NVIDIA CUDA GPU computing environment.

PyCUDA provides a python module to access the NVIDIA CUDA driver API from within Python code.

The module includes interoperability with numpy, and comprehensive metaprogramming facilities for dynamically generating and JIT compiling CUDA code using Python.

417 questions
-2
votes
1 answer

Errors in PyCuda indexing Numpy array of integers

I am moving my first steps into PyCuda to perform some parallel computation and I came across a behavior I do not understand. I started from the very basic tutorial that can be found on PyCuda official website (a simple script to double all elements…
Dam
  • 1
  • 2
-2
votes
1 answer

How to perform PyCUDA 4x4 matrix inversion with same accuracy than numpy linalg "inv" or "pinv" function

I am facing an issue of accuracy about my code which performs a number (128, 256, 512) of 4x4 matrix inversions. When I use the original version, i.e the numpy function np.linalg.inv or np.linalg.pinv, everything works fine. Unfortunately, with the…
user1773603
-2
votes
1 answer

CUDA: does size of input/output data have to be a multiple of the number of threads per block?

I have a Python code (for implementing RayTracing) that I'm running in parallel with PyCuda. import pycuda.driver as drv import pycuda.autoinit from pycuda.compiler import SourceModule import numpy as np from stl import mesh import time my_mesh =…
-2
votes
1 answer

CUDA profiling - high shared transactions/access but low local replay rate

After running the Visual Profiler, guided analysis tells me that I'm memory-bound, and that in particular my shared memory accesses are poorly aligned/accessed - basically every line I access shared memory is marked as ~2 transactions per access.…
linkhyrule5
  • 871
  • 13
  • 29
-2
votes
1 answer

Iterating through a 2D array in PyCUDA

I am trying to iterate through a 2D array in PyCUDA but I end up with repeated array values. I initially throw a small random integer array and that works as expected but when I throw an image at it, I see the same values over and over again. Here…
user2808264
  • 459
  • 2
  • 11
  • 24
-2
votes
1 answer

GPGPU performance in high-level languages

For my science fair project I have to write a computationally-intensive algorithm that is well suited to parallelization. I have read about OpenCL and CUDA and it seems they are mainly used from C/C++. While it would not be that difficult for me to…
Elliot Gorokhovsky
  • 3,610
  • 2
  • 31
  • 56
-2
votes
1 answer

install python module for using gpu on windows 8.1 x64

I have some trouble with installing python modules. I wanna use gpu in a python script but I get some error while install modules 1- I install my graphic driver : Geforce GT 650M 2- install cuda_5.5.31_winvista_win7_win8_win8.1_notebook_x64.exe Now…
-2
votes
1 answer

PyCuda Error in Execution

This is my pycuda code for rotation.I have installed the latest cuda drivers and I use a nvidia gpu with cuda support.I have also installed the cuda toolkit and pycuda drivers.Still I get this strange error. import pycuda.driver as cuda import…
user1635666
  • 125
  • 4
-3
votes
1 answer

Why is the image being partially processed?

It is been hours writing scripts and I think I am tired overlooking something simple. I have the following pycuda script import cv2 import numpy as np import time import pycuda.autoinit import pycuda.driver as cuda from pycuda.compiler import…
KansaiRobot
  • 7,564
  • 11
  • 71
  • 150
-3
votes
1 answer

CUDA out of memory error when doing matrix multiplication using Numba

I need to multiply a matrix with its transpose and I am running out of memory on my GPU with eror message numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY I am expecting the size of my matrix to be…
secretive
  • 2,032
  • 7
  • 16
-3
votes
1 answer

Parallel QuickSort, can someone help me?

I am trying to implement the quicksort parallelization by specifying the list separation snippet in two others compared to the pivo. I am having problems with the syntax and to save the pointer at the end of the two new lists. How do I get rid of…
Mateus Silva
  • 5
  • 1
  • 2
-5
votes
1 answer

cuda runtime api and dynamic kernel definition

Using the driver api precludes the usage of the runtime api in the same application ([1]) . Unfortunately cublas, cufft, etc are all based on the runtime api. If one wants dynamic kernel definition as in cuModuleLoad and cublas at the same time,…
melisgl
  • 308
  • 2
  • 13
1 2 3
27
28