4

In python, when I convert my set to a list, what is the algorithmic complexity of such a task? Is it merely type-casting the collection, or does it need to copy items into a different data structure? What's happening?

I'd love to learn that the complexity was constant, like so many things in Python.

macetw
  • 1,640
  • 1
  • 17
  • 26

3 Answers3

6

You can easily see this with a simple benchmark:

import matplotlib.pyplot as plt


x = list(range(10, 20000, 20))
y = []
for n in x:
    s = set(range(n))
    res = %timeit -r2 -n2 -q -o list(s)
    y.append(res.best)


plt.plot(x, y)

plot

Which clearly shows a linear relationship -- modulo some noise.

(EDITED as the first version was benchmarking something different).

norok2
  • 25,683
  • 4
  • 73
  • 99
  • So what's happening at 1000 and 6000? I understand that the overall shape is indicating O(N), I'm just curious what implementation detail is being shown by those steps. – Andrew Jaffe Jul 01 '20 at 15:00
  • 2
    I think the jumps are because the list implementation allocates extra space for growth, but has to reallocate the list when it reaches the limit. – Barmar Jul 01 '20 at 15:00
  • This isn't a way to measure the time complexity of an operation; time complexity is theoretical, and it only applies for "sufficiently large" n, where 20,000 may well not be "sufficiently large". This graph gives some evidence about what the time complexity *might* be, but you cannot measure the time complexity of an algorithm by actually running it and measuring with a timer. – kaya3 Jul 01 '20 at 15:20
  • @kaya3 Of course. I still find this heuristic approach helpful in understanding what kind of behavior we should expect in real-world applications. Besides, the question is rather practical. There are multiple approaches to defining a container like `set()` and one like `list()` which may vary in future implementation, and this simple heuristic will give insights without knowing internals. – norok2 Jul 01 '20 at 20:53
  • 1
    @AndrewJaffe It's the set's internal size, which quadruples at those points. – Kelly Bundy Jul 27 '20 at 20:44
  • @AndrewJaffe See plot of [time vs set size](https://colab.research.google.com/drive/19ysF2sSB4ahbyFmcneFoLjvhDmRtxWCA?usp=sharing). – Kelly Bundy Jul 27 '20 at 20:57
2

The time complexity in most cases will be O(n) where n is the size of the set, because:

  • The set is implemented as a hashtable whose underlying array size is bounded by a fixed multiple of the set's size. Iterating over the set is done by iterating over the underlying array, so it takes O(n) time.
  • Appending an item to a list takes O(1) amortized time, even if the list's underlying array is not originally allocated to be large enough for the whole set; so appending n items to an empty list takes O(n) time.

However, there is a caveat to this, which is that Python's sets have underlying array sizes based on the largest size the set object has had, not necessarily based on its current size; this is because the underlying array is not re-allocated to a smaller size when elements are removed from the set. If a set is small but used to be much larger, then iterating over it can be slower than O(n).

kaya3
  • 47,440
  • 4
  • 68
  • 97
1

The complexity is linear because all references are copied to the new container. But only references and are and not objects - it can matter for big objects.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252