Performance of Parallel computing is lower than No parallel computing in Python

Question

I just write an example for working on list and parallel on Numba as bellow by Parallel and No Parallel:

Parallel

@njit(parallel=True)
def evaluate():
  n = 1000000
  a = [0]*n
  sum = 0
  for i in prange(n):
    a[i] = i*i
  for i in prange(n):
    sum += a[i]
  return sum

No parallel

def evaluate2():
  n = 1000000
  a = [0]*n
  sum = 0
  for i in range(n):
    a[i] = i*i
  for i in range(n):
    sum += a[i]
  return sum

and compare the time of evaluation

t.tic()
print(evaluate())
t.toc()

result: 333332833333500000 Elapsed time is 0.233338 seconds.

t.tic()
print(evaluate2())
t.toc()

result: 333332833333500000 Elapsed time is 0.195136 seconds.

Full code can get from Colab

I never used numba before, but I read that it's best used on code that uses numpy arrays. Your example uses a list that could instead be an array `numpy.zeros(n, dtype = int)`. The numba docs on nopython mode suggests it converts lists to an efficient non-Python object and back (reflection), which could be taking up time. Additionally, is the `a` list necessary? It looks like you could compute the sum without it, just `sum += i*i` instead — BatWannaBe, Mar 13 '21 at 14:22
You are measuring compilation and runtime. Just measure the second call to get the runtime. — max9111, Mar 14 '21 at 21:45

score 1 · Answer 1 · answered Mar 13 '21 at 15:46

1

The answer is that the number of operations is still small. When I changed n to 100,000,000, the performance change significantly.

answered Mar 13 '21 at 15:46

Freelancer

837
6
21

I think you are right. Along the same lines, I think the operations done are very lightweight (only multiplication and addition of small numbers). If the operations were heavier, a larger improvement of parallelization can be expected. – Gilles Ottervanger Mar 13 '21 at 16:01

Peace_man_eng · Answer 2 · 2021-03-13T14:33:30.890

0

I haven't tried it in Numba yet. However, this exactly happens in Matlab or other programming languages when CPU is used as a non-parallel processor and GPU is used for parallel processing. When small data is processed, the CPU exceeds GPU in processing speed and parallel computing is not useful. Parallel processing is efficient only when the processing data size exceeds a certain value. There are benchmarks that show you when the parallel processing is efficient. I have read papers that they put switches in their codes for choosing between CPU and GPU during processing. Try the same code with a large array and compare the results.

edited Mar 13 '21 at 14:33

answered Mar 13 '21 at 14:19

Peace_man_eng

11
2

But I ran it on Colab, not on my computer. Does colab run on GPU? – Freelancer Mar 13 '21 at 14:42
According to the [docs](https://numba.pydata.org/numba-doc/dev/user/parallel.html), `@jit` only does parallelization on the CPU which induces much less overhead than offloading to the GPU. – Gilles Ottervanger Mar 13 '21 at 15:57

Performance of Parallel computing is lower than No parallel computing in Python

2 Answers2