Questions tagged [numba-pro]

NumbaPro - enhanced version of Numba, adding GPU support

NumbaPro is an enhanced version of Numba which adds premium features and functionality that allow developers to rapidly create optimized code that integrates well with NumPy.

With NumbaPro, Python developers can define NumPy ufuncs and generalized ufuncs (gufuncs) in Python, which are compiled to machine code dynamically and loaded on the fly. Additionally, NumbaPro offers developers the ability to target multicore and GPU architectures with Python code for both ufuncs and general-purpose code.

For targeting the GPU, NumbaPro can either do the work automatically, doing its best to optimize the code for the GPU architecture. Alternatively, CUDA-based API is provided for writing CUDA code specifically in Python for ultimate control of the hardware (with thread and block identities).

Here’s a list of highlighted features:

Portable data-parallel programming through ufuncs and gufuncs for single core CPU, multicore CPU and GPU Bindings to CUDA libraries: cuRAND, cuBLAS, cuFFT Python CUDA programming for maximum control of hardware resources

References:

79 questions
3
votes
1 answer

CUDA-Python: How to launch CUDA kernel in Python (Numba 0.25)?

could you please help me understand how to write CUDA kernels in Python? AFAIK, numba.vectorize can be performed on cuda, cpu, parallel(multi-cpus), based on target. But target='cuda' requires to set up CUDA kernels. The main issue is that many…
Novitoll
  • 820
  • 1
  • 9
  • 22
3
votes
0 answers

What is the main difference between Numba Pro and Theano/pyautodiff for GPU calculations?

Both Numba Pro and pyautodiff based on Theano supports conversion of Python code into GPU machine code. Theano will also allow symbolic derivation of the resulted syntax tree, but this is outside the scope of my question. My question is whether…
Mark Horvath
  • 1,136
  • 1
  • 9
  • 24
2
votes
0 answers

Numba matrix multiplication much slower than NumPy

I am implementing a simple matrix multiplication function with Numba and am finding it to be significantly slower than NumPy. In the below example, Numba is 40X slower. Is there a way to further speed up Numba? Thanks in advance for your…
user8768787
  • 21
  • 1
  • 2
2
votes
2 answers

Using numba for cosine similarity between a vector and rows in a matix

Found this gist using numba for fast computation of cosine similarity. import numba @numba.jit(target='cpu', nopython=True) def fast_cosine(u, v): m = u.shape[0] udotv = 0 u_norm = 0 v_norm = 0 for i in range(m): if…
Kamil Sindi
  • 21,782
  • 19
  • 96
  • 120
2
votes
1 answer

a CUDA error When a large array is used as input data

I have a code to do some calculation in GPU by python3.5 with numba and CUDA8.0. When an array with size(50,27) was input, it run successfully and get right result. I change the input data to size(200,340), it has an error. I use shared memory in my…
S.Jan
  • 31
  • 1
  • 5
2
votes
1 answer

Anaconda Accelerate check_cuda()

What is the correct anaconda accelerate function to check cuda? With numba-pro you could use: >>> from numbapro import check_cuda numbapro:1: ImportWarning: The numbapro package is deprecated in favour of the accelerate package. Please update your…
Lundy
  • 663
  • 5
  • 19
2
votes
1 answer

CUDA/Python: conversion error for matrix operation

I'm trying to execute a very basic neighbour algorithm on a matrix using NumbaPro CUDA Python. The function: @autojit(target="gpu") def removeNeighboursMatCUDA(tmp_frame): for j in range(255): for i in range(255): if…
tillúr
  • 93
  • 8
2
votes
1 answer

NVVM_ERROR_INVALID_OPTION when using the CUDA kernel with Numbapro api

I want to execute a CUDA kernel in python using Numbapro API. I have this code: import math import numpy from numbapro import jit, cuda, int32, float32 from matplotlib import pyplot @cuda.jit('void(float32[:], float32[:], float32[:], float32[:],…
Hopobcn
  • 885
  • 10
  • 20
2
votes
1 answer

Numbapro Quickstart Guide Error

I'm trying to follow the NumbaPro quickstart guide, but I'm getting an error when following the instructions. Here is my situation: Python 2.7.6 Cuda compilation tools v5.5.0 conda 3.4.1 accelerate 1.5.0 Windows 7 Professional Nvidia GeForce…
2
votes
0 answers

casting error using the numbapro cuda extension

I'm trying to run a little device kernel function on a shared array: from numbapro import cuda, float32 @cuda.jit('void(float32[:,:],float32,float32)',device=True) def cu_calculate_distance(template, dx, dy) : side_length = template.shape[0] …
Rok
  • 613
  • 4
  • 17
1
vote
2 answers

Numba "LoweringError" for complex numbers in numpy array

I have to make a calculation using complex arrays, however when using numba to speed up the process I get an error numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend). Here it is a simplified version of my…
Marcos
  • 57
  • 5
1
vote
1 answer

Does Numba support in-built python function e.g. `setitem`

TypingError: Failed in nopython mode pipeline (step: nopython frontend) No implementation of function Function() found for signature: This error is encountered when tried to set all the elements less than a given…
1
vote
1 answer

Is it not possible to call the in-built function e.g. svd in a custom python function that is parallized using @njit?

The following error is obtained while trying to run my for loop involving a svd function TypingError: Failed in nopython mode pipeline (step: nopython frontend) Untyped global name 'svd': cannot determine Numba type of The code…
1
vote
0 answers

Is there a way to pin arrays in Numba, for fast data transfer to/from device?

In Pytorch, there is an option to pin CPU arrays for fast transfer to GPU (does not seem to work for GPU -> CPU though). I am wondering if there is a way to pin Numba arrays to memory, or any alternative technique for fast transfer from CPU to GPU.…
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
1
vote
0 answers

NUMBA: Ahead-of-Time issues

I'm trying to use numba to speed up a slow calculation. It works great with the @njit decorator but I really need it to work as a precompiled ahead-of-time(AOT) module. Sadly I haven't been able to get it to work. Here is the code I use to…
brook
  • 247
  • 2
  • 15