Questions tagged [cupy]

CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA.

About CuPy

From the CuPy homepage:

High Performance with CUDA

CuPy is an open-source matrix library accelerated with NVIDIA CUDA. It also uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver, cuSPARSE, cuFFT and NCCL to make full use of the GPU architecture.

Highly Compatible with NumPy

CuPy's interface is highly compatible with ; in most cases it can be used as a drop-in replacement. All you need to do is just replace numpy with cupy in your Python code. It supports various methods, indexing, data types, broadcasting and more.

CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface.

Resources

339 questions
3
votes
1 answer

Cupy slower than numpy when iterating through array

I have code, that I want to parallelize with cupy. I thought it would be straight forward - just write "import cupy as cp", and replace everywhere I wrote np., with cp., and it would work. And, it does work, the code does run, but takes much slower.…
Ipulatov
  • 175
  • 4
  • 11
3
votes
1 answer

CuPy installatioin fails on Mac OS X 10.13.6 using pip

On MacOS HighSierra 10.13.6 with Python 3.5.7 and Cuda 10.1 Both pip3.5 install cupy-cuda101 and pip3.5 install cupy fail, with different issues. First attempt: pip3.5 install cupy-cuda101 -vvvv Collecting cupy-cuda101 1 location(s) to…
3
votes
0 answers

cupy.cuda.cublas.CUBLASError: CUBLAS_STATUS_NOT_INITIALIZED when doing cupy matrix multiplication

I am a newbie comes to deal with managing conda environment and pip, etc. When I tried to do two cupy array matrix (matrix_V and vector_u) dot product, I encountered the following error message: vector_predict = matrix_V.dot(vector_u) File…
kail
  • 51
  • 6
3
votes
2 answers

TypeError: list indices must be integers or slices, not cupy.core.core.ndarray

In object detection algorithms, Non-Maximum Suppression(NMS) is used to discard extra detection results for an object e.g. a vehicle. Normally, horizontal bounding boxes are used in object detection algorithms and the GPU implementation of…
Majid Azimi
  • 907
  • 1
  • 11
  • 29
3
votes
1 answer

Why does my RawKernel reducer cause cudaErrorIllegalAddress?

My goal is to write a custom reduction kernel that returns both the argmax along each row as well as the difference between the max and submax (second-largest max). I am new to CUDA and I am working with cupy. As a first step, I tried to write my…
Kyle McDonald
  • 1,171
  • 2
  • 11
  • 17
3
votes
1 answer

Where is @cupy.fuse cupy python decorator documented?

I've seen some demos of @cupy.fuse which is nothing short of a miracle for GPU programming using Numpy syntax. The major problem with cupy is that each operation like adding is a full kernel launch, then kernel free. SO a series of adds and…
2
votes
1 answer

How do I force cupy to free all gpu memory after going out of scope?

I have a memory intensive gpu-based (CUDA C++ linked with cython) model to execute that has a substantial preprocessing step before running. Until now, the preprocessing step has been done on the cpu, with the results being then passed to the gpu…
dcgt1
  • 33
  • 3
2
votes
1 answer

Rapids.ai / difference of computation with log between Pandas and cudf

Here are my code for comparison between cudf and pandas performance : gpuDF2 = cudf.DataFrame({'col_1': np.arange(0, 10_000_000), 'col_2': np.arange(0, 10_000_000)}) pandasDF2= pd.DataFrame({'col_1':np.arange(0,10_000_000),…
fransua
  • 501
  • 2
  • 18
2
votes
0 answers

Using NCCL library from cupy in a cython file (.pyx)

I am trying to import the function and definitions on https://github.com/cupy/cupy/blob/master/cupy_backends/cuda/libs/nccl.pyx To a pyx file with a cimport, but it is not working. I would like to know if it is even possible and if someone can give…
Sophia
  • 21
  • 1
2
votes
2 answers

Vectorized numpy: check if point is inside sphere?

I have a numpy array centers which is N row by 3 columns and contains the 3D coordinates of the center of the spheres. Then another Nx1 array radii which contains the radii corresponding to the spheres. Finally, I have a third array points which are…
adamcircle
  • 694
  • 1
  • 10
  • 26
2
votes
0 answers

how to release gpu after cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory

after raise cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory in fastapi, gpu is not freed, how to free gpu File "/app/app/core/color_retrieval.py", line 115, in _cal_distance dis = cp.subtract(mosaics,…
yyyyyyyy
  • 21
  • 1
2
votes
2 answers

Failed to import cupy

After installing cupy via "pip install cupy-cuda110", I tried this in python3: import cupy as cp However, it failed: " $ python3 Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or…
ztan
  • 31
  • 1
  • 2
2
votes
1 answer

What does it mean by say GPU under ultilization due to low occupancy?

I am using NUMBA and cupy to perform GPU coding. Now I have switched my code from a V100 NVIDIA card to A100, but then, I got the following warnings: NumbaPerformanceWarning: Grid size (27) < 2 * SM count (216) will likely result in GPU under…
ZHANG Juenjie
  • 501
  • 5
  • 20
2
votes
1 answer

Install cupy on MacOS without GPU support

I've been making the rounds on forums trying out different ways to install cupy on MacOS running on a device without a Nvidia GPU. So far, nothing has worked. I've tried both a Homebrew install of Python 3.7 and a conda install of Python 3.7 and…
Nold
  • 60
  • 1
  • 7
2
votes
1 answer

cupy performs task for 48ms vs numpy for 4ms - why and how to fix it?

I try to use cupy to perform task on GPU - here is a code: # on CPU x_cpu = np.array([1, 2, 3]) %timeit l2_cpu = np.linalg.norm(x_cpu) # on GPU x_gpu = cp.array([1, 2, 3]) %timeit l2_gpu = cp.linalg.norm(x_gpu) here is the output: 4 µs ± 18 ns per…
DL-Newbie
  • 146
  • 1
  • 9
1 2
3
22 23