Algorithmic complexity to convert a set to a list in python

Question

In python, when I convert my set to a list, what is the algorithmic complexity of such a task? Is it merely type-casting the collection, or does it need to copy items into a different data structure? What's happening?

I'd love to learn that the complexity was constant, like so many things in Python.

norok2 · Accepted Answer · 2020-07-01T14:51:20.467

6

You can easily see this with a simple benchmark:

import matplotlib.pyplot as plt


x = list(range(10, 20000, 20))
y = []
for n in x:
    s = set(range(n))
    res = %timeit -r2 -n2 -q -o list(s)
    y.append(res.best)


plt.plot(x, y)

Which clearly shows a linear relationship -- modulo some noise.

(EDITED as the first version was benchmarking something different).

edited Jul 01 '20 at 14:51

answered Jul 01 '20 at 14:46

norok2

25,683
4
73
99

So what's happening at 1000 and 6000? I understand that the overall shape is indicating O(N), I'm just curious what implementation detail is being shown by those steps. – Andrew Jaffe Jul 01 '20 at 15:00
2

I think the jumps are because the list implementation allocates extra space for growth, but has to reallocate the list when it reaches the limit. – Barmar Jul 01 '20 at 15:00
This isn't a way to measure the time complexity of an operation; time complexity is theoretical, and it only applies for "sufficiently large" n, where 20,000 may well not be "sufficiently large". This graph gives some evidence about what the time complexity *might* be, but you cannot measure the time complexity of an algorithm by actually running it and measuring with a timer. – kaya3 Jul 01 '20 at 15:20
@kaya3 Of course. I still find this heuristic approach helpful in understanding what kind of behavior we should expect in real-world applications. Besides, the question is rather practical. There are multiple approaches to defining a container like `set()` and one like `list()` which may vary in future implementation, and this simple heuristic will give insights without knowing internals. – norok2 Jul 01 '20 at 20:53
1

@AndrewJaffe It's the set's internal size, which quadruples at those points. – Kelly Bundy Jul 27 '20 at 20:44
@AndrewJaffe See plot of [time vs set size](https://colab.research.google.com/drive/19ysF2sSB4ahbyFmcneFoLjvhDmRtxWCA?usp=sharing). – Kelly Bundy Jul 27 '20 at 20:57

score 2 · Answer 2 · answered Jul 01 '20 at 15:09

The time complexity in most cases will be O(n) where n is the size of the set, because:

The set is implemented as a hashtable whose underlying array size is bounded by a fixed multiple of the set's size. Iterating over the set is done by iterating over the underlying array, so it takes O(n) time.
Appending an item to a list takes O(1) amortized time, even if the list's underlying array is not originally allocated to be large enough for the whole set; so appending n items to an empty list takes O(n) time.

However, there is a caveat to this, which is that Python's sets have underlying array sizes based on the largest size the set object has had, not necessarily based on its current size; this is because the underlying array is not re-allocated to a smaller size when elements are removed from the set. If a set is small but used to be much larger, then iterating over it can be slower than O(n).

score 1 · Answer 3 · answered Jul 01 '20 at 14:46

1

The complexity is linear because all references are copied to the new container. But only references and are and not objects - it can matter for big objects.

answered Jul 01 '20 at 14:46

Serge Ballesta

143,923
11
122
252

Algorithmic complexity to convert a set to a list in python

3 Answers3