0

I have three versions of the functions to do element-wise comparison of two lists and output a count of results. First uses for loop (simple function), second uses list expression, third uses numpy. I expected numpy to be super-fast especially once when the list sizes are large, but find its not fast consistently.

Running this on google colab with different array sizes gives me results as follows: SN) ArraySize RuntimeRatio=Simple:Optimised:Numpy

  1. 25 1:0.43:2.61
  2. 25 1:0.22:0.46
  3. 25 1:0.63:0.29
  4. 25 1:0.75:1.18
  5. 2500 1:0.89:3.07
  6. 2500 1:0.84:1.51
  7. 2500 1:0.59:0.79
  8. 2500 1:0.75:2.19
  9. 250000 1:1.26:2.64
  10. 250000 1:1.23:2.18
  11. 250000 1:1.25:2.22
  12. 250000 1:0.90:1.56
  13. 25000000 1:1.40:2.25
  14. 25000000 1:1.32:2.22
  15. 25000000 1:1.29:2.17
  16. 25000000 1:1.28:2.19

Any ideas on whats happening or what I am doing wrong?

The code:

import numpy as np
import time
import random

def solution_list_simple(a, b):
  answer = 0
  for aval, bval in zip(a, b):
    if aval > bval:
      answer += 1
  return answer
###########
def solution_list_opti(a, b):
  return [ a_ele > b_ele for a_ele, b_ele in zip(a, b) ].count(True)
###########
def solution_np(a, b):
  return np.sum( np.array(a) > np.array(b) )
###########
random.seed(30)
howmanyvalue = 250000
A = [ random.randint(1, 100) for _ in range(howmanyvalue) ]
B = [ random.randint(1, 100) for _ in range(howmanyvalue) ]

## list version - simple
start_time = time.time()
print(f"")
print(f"\nanswer list simple = {solution_list_simple(A, B)}")
runtime_list_simple = time.time() - start_time
print(f"Runtime list simple = {runtime_list_simple}")

## list version - optmised
start_time = time.time()
print(f"")
print(f"\nanswer list_opti = {solution_list_opti(A, B)}")
runtime_list_opti = time.time() - start_time
print(f"Runtime list optimised = {runtime_list_opti}")

## numpy version
start_time = time.time()
print(f"")
print(f"\nanswer numpy = {solution_np(A, B)}")
end_np = time.time()
runtime_numpy = time.time() - start_time
print(f"Runtime numpy = {runtime_numpy}")

print(f"\n\nRelative ratios\nlist_simple : list_optimised : numpy = {1} : {(runtime_list_opti / runtime_list_simple):.2f} : {(runtime_numpy / runtime_list_simple):.2f}")
anon01
  • 10,618
  • 8
  • 35
  • 58
rbewoor
  • 305
  • 1
  • 3
  • 14
  • 3
    Converting a list to array takes time. – hpaulj Sep 05 '20 at 22:01
  • @hpaulj : True, but I thought the idea of a numpy array was that is contiguous memory as opposed to a list. So the vectorised (SIMD) operations would be much faster especially with large size like 25000000. Even with this huge size its more than twice slower. – rbewoor Sep 05 '20 at 23:06
  • 2
    Did you time the `np.array(a)` by itself? And `np.sum( A > B )` using arrays? `numpy` compiled operations on whole arrays are indeed fast, but only if you start with arrays. – hpaulj Sep 05 '20 at 23:45

1 Answers1

0

Simple. Don't include the creation of the numpy array in the timing stuff.