0

So I am trying to keep a list in order at all times. So whenever new data comes in, I will insert this to the 'sorted list'.

Question, why is bisect.insort much faster than my linked list implementation. I know bisect search takes O(logn), but due to the insertion to the list, it really takes O(n). Where as the linked list implementation should also be O(n). Inserting new values in the sorted linked list should also be O(n). But why is the time comparison much much slower? Is my linked list implementation not optimized?

Here's my sample code:

import timeit
import bisect
import random


# Testing parameters
NUM_ITERATION_TEST = 10
TOTAL_NUM_DATA = 10000
DATA = [random.randint(0, 1000) for x in range(TOTAL_NUM_DATA)]


class Node():
    def __init__(self, val):
        self.val = val
        self.next = None


class LinkedListIterator():
    def __init__(self, head):
        self.current = head

    def __iter__(self):
        return self

    def __next__(self):
        if not self.current:
            raise StopIteration
        else:
            val = self.current.val
            self.current = self.current.next
            return val


class LinkedList():
    def __init__(self):
        self.head = None

    def __iter__(self):
        return LinkedListIterator(self.head)

    def insert(self, val):
        new_node = Node(val)

        if self.head is None:
            self.head = new_node
            return

        curr = self.head
        if curr.val > val:
            new_node.next = curr
            self.head = new_node
            return

        while curr.next:
            if curr.next.val > val:
                break
            curr = curr.next

        new_node.next = curr.next
        curr.next = new_node


def method1(DATA):
    sorted_list = []
    for num in DATA:
        bisect.insort_right(sorted_list, num)


def method2(DATA):
    sorted_list = LinkedList()
    for num in DATA:
        sorted_list.insert(num)


if __name__ == "__main__":
    # METHOD 1
    print("Method 1 Execution Time:")
    print(timeit.timeit("test_timeit.method1(test_timeit.DATA)",
                        number=NUM_ITERATION_TEST,
                        setup="import test_timeit"))

    # METHOD 2
    print("Method 2 Execution Time:")
    print(timeit.timeit("test_timeit.method2(test_timeit.DATA)",
                        number=NUM_ITERATION_TEST,
                        setup="import test_timeit"))

The execution times are:

Method 1 Execution Time:
0.11593010000000001
Method 2 Execution Time:
33.0651346

I also tried using other implementations like sorted dicts, but nothing really beats the bisect implementation. Is there a more efficient implementation? Basically want an always sorted list of data, where I would constantly add/insert new data to the list..

user1179317
  • 2,693
  • 3
  • 34
  • 62
  • 2
    It doesn't really answer your question, but fwiw, I think bisesct search (without the insert) is O(logn) and O(nlogn). Also comparing computational complexity when one process has access to optimized c-level routines and the other doesn't isn't super-useful. I think it would be fair to say that your linked-list is *not* optimized (and can't be) in the same sense that python's list and bisect functions are. – Mark Jul 27 '20 at 16:22
  • Right, the search is O(log n), I edited my question. Yea thats true, bisect is c optimized. So its possible there is no better way to do it than bisect? – user1179317 Jul 27 '20 at 16:27
  • Python built-in methods can be written in C which gives them a huge speed advantage. – Mark Ransom Jul 27 '20 at 16:27
  • 1
    Your linked list implementation is O(n) for the search and O(1) for the insert. `bisect` is O(logn) for the search and O(n) for the insert. Close enough for most cases. I think the difference must mostly be in the speed between C and Python. – Mark Ransom Jul 27 '20 at 16:33

1 Answers1

3

Your own implementation is executed by the Python interpreter, creating and linking dynamic run-time objects with a lot of superfluous validity checks, one object creation or deletion at a time. The built-in function is optimized in C, already complied. It can allocate memory in larger chunks, manufacture new objects in a single struct mapping, avoid many of the validity checks, ...

A C-based built-in will (almost) always beat anything you can program in Python.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • In particular after giving it some thought I think the O(n) C-based list insert is going to be vastly faster than the O(n) Python-based linked-list search. – Mark Ransom Jul 27 '20 at 16:49
  • Ok thanks for the clarification. Guess I'll have to use bisect then – user1179317 Jul 27 '20 at 17:02
  • I would like to note that the bisect module is not implemented in C. [The source code](https://github.com/python/cpython/blob/3.11/Lib/bisect.py) is written completely in python (link found in [the documentation](https://docs.python.org/3/library/bisect.html)). However, this python code is well optimized and uses built-in lists, which means that it will almost certainly be faster than anything we could write. – The_spider Jul 09 '23 at 19:28
  • Thanks for the updates, folks. I contacted a former colleague; my response is, unfortunately, based on my employer's Python distribution, optimized for the entire HW-FW-SW stack. My answer does not apply in general. Might one of you care to write a new, accurate answer? If so, I'll simply delete mine and upvote yours. – Prune Jul 12 '23 at 22:16