Questions tagged [numba-pro]

NumbaPro - enhanced version of Numba, adding GPU support

NumbaPro is an enhanced version of Numba which adds premium features and functionality that allow developers to rapidly create optimized code that integrates well with NumPy.

With NumbaPro, Python developers can define NumPy ufuncs and generalized ufuncs (gufuncs) in Python, which are compiled to machine code dynamically and loaded on the fly. Additionally, NumbaPro offers developers the ability to target multicore and GPU architectures with Python code for both ufuncs and general-purpose code.

For targeting the GPU, NumbaPro can either do the work automatically, doing its best to optimize the code for the GPU architecture. Alternatively, CUDA-based API is provided for writing CUDA code specifically in Python for ultimate control of the hardware (with thread and block identities).

Here’s a list of highlighted features:

Portable data-parallel programming through ufuncs and gufuncs for single core CPU, multicore CPU and GPU Bindings to CUDA libraries: cuRAND, cuBLAS, cuFFT Python CUDA programming for maximum control of hardware resources

References:

79 questions
1
vote
1 answer

cuda code error within numbapro

import numpy import numpy as np from numbapro import cuda @cuda.autojit def foo(aryA, aryB,out): d_ary1 = cuda.to_device(aryA) d_ary2 = cuda.to_device(aryB) #dd = numpy.empty(10, dtype=np.int32) d_ary1.copy_to_host(out) griddim =…
John
  • 69
  • 1
  • 7
0
votes
1 answer

Matplotlib with Numba to try and accelerate code

How could I use numba for the following code to try and accelerate it? When I add @njit before func1 several error returns? Or would there be another way to optimise/accelerate the code to reduce the overall process time as depeding on the number of…
jdr
  • 3
  • 2
0
votes
1 answer

How to change array element in numba parallel computing

I hope to modify a NumPy array element in a parallel computing, by using njit as follows. def get_indice(): .... return new_indice @njit() def myfunction(arr): for i in prange(100): indices = get_indice() arr[indices] +=…
Xudong
  • 441
  • 5
  • 16
0
votes
1 answer

Python Numba Recursive Method Implementation in Numba Class

I am currently implementing a Python class wrapped by the Numba @jitcalss decorator. My problem is about writing recursive methods. I know there are ways of writing recursive methods as iterative methods as well, but in my case, I believe that…
Okan Erturk
  • 113
  • 4
0
votes
1 answer

How to Append list of type String Int and Float using Numba

I am using Numba to improve the speed of the below loop. without Numba it takes 135 sec to execute and with Numba it takes 0.30 sec :) which is very fast. In the below loop I comparing the array with a threshold of 0.85. If the condition turns out…
vivek
  • 61
  • 1
  • 1
  • 8
0
votes
0 answers

Does Numba support for multiple dictionary multiplication in terms of `multi_dot`?

I need to multiply three 2-D matrices of size 25x25, 25x60, and 60x60 to get a result of size 25x60 using numpy. For fast multiplication, I wanted to use from numpy.linalg import multi_dot and also tried to parallely execute it in GPU using…
0
votes
2 answers

Performance of Parallel computing is lower than No parallel computing in Python

I just write an example for working on list and parallel on Numba as bellow by Parallel and No Parallel: Parallel @njit(parallel=True) def evaluate(): n = 1000000 a = [0]*n sum = 0 for i in prange(n): a[i] = i*i for i in prange(n): …
Freelancer
  • 837
  • 6
  • 21
0
votes
2 answers

how to use Shared memory and Global memory and is it possible to use shared as intermediate stage in calculating

I am trying to write a code in numba cuda. I saw a lot of examples that deal with device memory and shared memory separately. I got stuck and confused. Can the code or the function deal with both, as example can the code multiply numbers using…
hend
  • 77
  • 1
  • 2
  • 7
0
votes
1 answer

How fast or slow is the Constant memory that Numba allows a device to allocate, when compared to local and shared memories?

I can't find any clarity as to what is the performance of the so called Constant memory referred to in the Numba documentation: https://numba.pydata.org/numba-doc/dev/cuda/memory.html#constant-memory I am curious as to what are the size limits for…
Edy Bourne
  • 5,679
  • 13
  • 53
  • 101
0
votes
1 answer

cuDF - groupby UDF to support datetime

I have a cuDF dataframe with following columns: columns = ["col1", "col2", "dt"] The (dt) in the form of datetime64[ns]. I would like to write a UDF to apply to each group in this dataframe, and get max of dt for each group. Here is what I am…
khan
  • 7,005
  • 15
  • 48
  • 70
0
votes
1 answer

Replacing the njit decorator with the cuda.jit decorator

I have an Nvidia GPU, downloaded CUDA, and am trying to make use of it. Say I have this code: #@cuda.jit (Attempted fix #1) #@cuda.jit(device = True) (Attempted fix #2) #@cuda.jit(int32(int32,int32)) (Attempted fix #3) @njit def product(rho,…
Ipulatov
  • 175
  • 4
  • 11
0
votes
1 answer

How to disable or remove numba and cuda from python project?

i've cloned a "PointPillars" repo for 3D detection using just point cloud as input. But when I came to run it, I noted it use cuda and numba. With any prior knowledge about these two, I'm asking if there is any way to remove or disable numba and…
Joseph Sh
  • 51
  • 4
0
votes
2 answers

pass list of list into numba function in no python mode, if element in list_of_list[0] doe not work

See the following minimum code, import numba list_of_list = [[1, 2], [34, 100]] @numba.njit() def test(list_of_list): if 1 in list_of_list[0]: return 'haha' test(list_of_list) This won't work and it seems that list_of_list[0] is no…
Jiadong
  • 1,822
  • 1
  • 17
  • 37
0
votes
1 answer

How to accelerate this function using Numba?

I was trying to optimize this function using Numba, but I am unable to do it. I think this has no part of the code which can be accelerated. If anyone can help me with an optimized version of this, My program would become blazing fast. Please tell…
darkcodernavv
  • 29
  • 1
  • 10
0
votes
2 answers

no module named numbapro

I ran this code I read on a CUDA Python intro page:- import numpy as np from timeit import default_timer as timer from numbapro import vectorize @vectorize(["float32(float32, float32)"], target='gpu') def VectorAdd(a, b): return a + b def…
dtn34-
  • 321
  • 3
  • 11