6

I did an experiment in which I tried to find the time it takes to search a python list. I have a list arr with random integers. arr_s has the same elements only sorted.

arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

Now I create a random array of integers find which has elements that I want to search in arr and arr_s.

>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr:
...:        continue

[OUT]:100 loops, best of 3: 2.18 ms per loop


>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr_s:
...:        continue

[OUT]:100 loops, best of 3: 5.15 ms per loop

Now I understand that I have not used any specific method to search in the sorted array (e.g. binary search). So it might be doing the standard linear search but why does it take significantly longer to search in the sorted array that in the unsorted array? I would think that it should take nearly the same time. I have tried all sorts of find array. Arrays which have integers from (0, 1000), (-1000, -100) and (-10000, 10000) the loops always take longer for the sorted array.

Shishir Pandey
  • 832
  • 1
  • 12
  • 23
  • 1
    you can perhaps find some partial answer in http://stackoverflow.com/questions/12905513/python-in-keyword-efficiency – Fredrik Pihl Sep 05 '13 at 18:51

3 Answers3

7
arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

arr is an array. arr_s is a list. Searching an array can be handled efficiently by numpy, but searching a list requires following pointers and performing type checks. It has nothing to do with sorting.

Note: in does weird things in numpy. Using in with numpy ndarrays may be a bad idea.

Community
  • 1
  • 1
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I converted the array to list. Now they both take the same time. – Shishir Pandey Sep 05 '13 at 18:57
  • This answer is correct. Python lists are unfortunately...pretty inefficient. :\ – Shashank Sep 05 '13 at 18:57
  • 2
    Iterating over a numpy array is slow as heck because numpy has to create wrapper objects for array elements when you access them. This is one of many reasons why you should always use vectorized operations instead of loops when working with ndarrays. – user2357112 Sep 05 '13 at 20:16
0

Python lists are not like C arrays. They are not just a simple block of memory where element 1 always comes after element 0, and so on. Instead, under the hood Python is storing things in a flexible way so that you can add and remove elements of arbitrary types and move things around at will.

In this case, my guess is that the act of sorting the list changes the underlying organization, making it somewhat less efficient to access the elements.

0

I do not have an exact answer but a possible starting point is to check at the iterators used by each object.



    In [9]: it = arr.__iter__()
    In [10]: its = arr_s.__iter__()
    In [11]: type(it)
    Out[11]: iterator
    In [12]: type(its)
    Out[12]: listiterator

They apparently use two different iterators which could explain the difference in speed.