Questions tagged [pycuda]

PyCUDA is the Python module which provides a comprehensive pythonic interface to the NVIDIA CUDA GPU computing environment.

PyCUDA provides a python module to access the NVIDIA CUDA driver API from within Python code.

The module includes interoperability with numpy, and comprehensive metaprogramming facilities for dynamically generating and JIT compiling CUDA code using Python.

417 questions
0
votes
1 answer

Getting different results everytime in pycuda FFT

I am writing code which can compare between numpy.fft.fft2 and pycuda but the results are not matching. Additionally pycuda results are ambiguous every time. data file : https://nofile.io/f/bjGRQGRVSCG/gauss.npy from pyfft.cuda import Plan import…
0
votes
2 answers

Cuda threads per block with multiple GPU's

Using Cuda GPU programming in a college project and just wondering if a GPU has a possible block size of 1024 if you have 2 GPU's does that mean that that block size is doubled? And would this effect the implementation of the program do you need to…
GLalor
  • 55
  • 1
  • 8
0
votes
1 answer

PyCuda cudaLaunchCooperativeKernel api

Is it possible somehow to use / launch the cudaLaunchCooperativeKernel api with pycuda? Hoping to achieve sync at grid level with such.
FunkyOne
  • 51
  • 1
  • 5
0
votes
1 answer

How to get and set XORWOWRandomNumberGenerator state for persistence?

My goal is to persist the full state of each iteration of a complex algorithm which also involves pseudo-random numbers generated via pycuda. In order to resume the algorithm at an arbitraty iteration and deterministically reproduce the same…
schreon
  • 1,097
  • 11
  • 25
0
votes
1 answer

How to use child kernels (CUDA dynamic parallelism) using PyCUDA

My python code has a gpu kernel function which is called multiple times in a for loop from host like this : for i in range: gpu_kernel_func(blocksize, grid) Since this function call requires communication between host and gpu device…
RaVi
  • 41
  • 1
  • 5
0
votes
1 answer

How to use syncthreads in CUDA for a scan algorithm (Hillis-Steele)

I'm trying to implement a scan algorithm (Hillis-Steele) and I'm having some trouble understanding how to do it properly on CUDA. This is a minimal example using pyCUDA: import pycuda.driver as cuda import pycuda.autoinit import numpy as np from…
0
votes
2 answers

PyCUDA clean-up error, CUDA launch timed out error, on some machines only

I have a Python 3 program that involves the execution of a cuda kernel. The code runs fine when I launch it in the following configuration GeForce GTX 1080 Ti GPU Ubuntu 16.04 CUDA version 8.0.61 NVIDIA driver version 384.111 Python version…
Amos Egel
  • 937
  • 10
  • 24
0
votes
1 answer

How to relate kernel input data structure in CUDA kernel function with parameter input in pycuda

I am writing a cuda kernel to convert rgba image to gray scale image in pycuda, here is the PyCUDA code: import numpy as np import matplotlib.pyplot as plt import pycuda.autoinit import pycuda.driver as cuda from pycuda.compiler import…
Jiadong
  • 1,822
  • 1
  • 17
  • 37
0
votes
1 answer

PyCUDA value from host to device not get the correct value

I intended to write a kernel in PyCUDA to generate 2d Gaussian patches. However, values defined by me in the host change after copy them into device. Below is the code. import numpy as np import matplotlib.pyplot as plt import pycuda.driver as…
Jiadong
  • 1,822
  • 1
  • 17
  • 37
0
votes
1 answer

PyCUDA 2D array implementations (or working with strings)

I'm trying to work with an array of strings(words) in CUDA. I tried flattening it by creating a single string, but then then to index it, I'd have to go through some of it each time a kernel runs. If there are 9000 words with a length of 6…
tamasfe
  • 157
  • 1
  • 8
0
votes
1 answer

"Peer access" failed when using pycuda and tensorflow together

I have some codes in python3 like this: import numpy as np import pycuda.driver as cuda from pycuda.compiler import SourceModule, compile import tensorflow as tf # create device and…
0
votes
1 answer

How is my pyCuda indexing working?

I am trying to load a 3d array into pycuda (i'm going to load images). I want each thread to handle all the channels of a single pixel using a for loop(this is an algorithmic requirement). So far I have this working: from pycuda.compiler import…
harveyslash
  • 5,906
  • 12
  • 58
  • 111
0
votes
0 answers

Trying to run PyCUDA I get a LogicError message

I'm trying to getting started with PyCUDA and parallel computing. I'm working on a 2012 iMac (High Sierra 10.13 NVIDIA GeForce GTX). This is a part of a simple tutorial I was trying: import pycuda import pycuda.driver as drv drv.init() print("%d…
0
votes
1 answer

Can I use OpenACC to system call Python function?

I want to parallelize a Python loop on GPU, but I don't want to use pyCUDA, because I need to do lots of thing myself. I am looking for something like OpenACC as in C++ for Python to implement the simple parallelization, but it seems no such thing.…
hadesmajesty
  • 51
  • 2
  • 6
0
votes
1 answer

How to get PyCuda SourceModule to compile multiple source files containing device code?

I'm trying to use some LAPACKE functions inside a CUDA kernel to solve small systems of linear equations. I have a main source file that contains the kernel function I want to call. Inside that kernel function I want to call the LAPACKE function…
Thomas
  • 1,103
  • 3
  • 13
  • 25