0

I just write an example for working on list and parallel on Numba as bellow by Parallel and No Parallel:

Parallel

@njit(parallel=True)
def evaluate():
  n = 1000000
  a = [0]*n
  sum = 0
  for i in prange(n):
    a[i] = i*i
  for i in prange(n):
    sum += a[i]
  return sum

No parallel

def evaluate2():
  n = 1000000
  a = [0]*n
  sum = 0
  for i in range(n):
    a[i] = i*i
  for i in range(n):
    sum += a[i]
  return sum

and compare the time of evaluation

t.tic()
print(evaluate())
t.toc()

result: 333332833333500000 Elapsed time is 0.233338 seconds.

t.tic()
print(evaluate2())
t.toc()

result: 333332833333500000 Elapsed time is 0.195136 seconds.

Full code can get from Colab

Freelancer
  • 837
  • 6
  • 21
  • I never used numba before, but I read that it's best used on code that uses numpy arrays. Your example uses a list that could instead be an array `numpy.zeros(n, dtype = int)`. The numba docs on nopython mode suggests it converts lists to an efficient non-Python object and back (reflection), which could be taking up time. Additionally, is the `a` list necessary? It looks like you could compute the sum without it, just `sum += i*i` instead – BatWannaBe Mar 13 '21 at 14:22
  • You are measuring compilation and runtime. Just measure the second call to get the runtime. – max9111 Mar 14 '21 at 21:45

2 Answers2

1

The answer is that the number of operations is still small. When I changed n to 100,000,000, the performance change significantly.

Freelancer
  • 837
  • 6
  • 21
  • I think you are right. Along the same lines, I think the operations done are very lightweight (only multiplication and addition of small numbers). If the operations were heavier, a larger improvement of parallelization can be expected. – Gilles Ottervanger Mar 13 '21 at 16:01
0

I haven't tried it in Numba yet. However, this exactly happens in Matlab or other programming languages when CPU is used as a non-parallel processor and GPU is used for parallel processing. When small data is processed, the CPU exceeds GPU in processing speed and parallel computing is not useful. Parallel processing is efficient only when the processing data size exceeds a certain value. There are benchmarks that show you when the parallel processing is efficient. I have read papers that they put switches in their codes for choosing between CPU and GPU during processing. Try the same code with a large array and compare the results.

  • But I ran it on Colab, not on my computer. Does colab run on GPU? – Freelancer Mar 13 '21 at 14:42
  • According to the [docs](https://numba.pydata.org/numba-doc/dev/user/parallel.html), `@jit` only does parallelization on the CPU which induces much less overhead than offloading to the GPU. – Gilles Ottervanger Mar 13 '21 at 15:57