1

I have used during a problem this piece of code to retrieve the count of each element in a list :

nums = [1,1,3,2,3,3,4,1,1,4,2,3,3,2,1,3,4,1,2]
print([nums.count(num) for num in set(nums)])

This code works well but doesn't look like to be as efficient as this code :

from collections import Counter
nums = [1,1,3,2,3,3,4,1,1,4,2,3,3,2,1,3,4,1,2]
counter = Counter(nums)
print(counter.values())

Can someone explains to me why is the collections library is more faster than the vanilla list comprehension ?

Lukas Laudrain
  • 436
  • 3
  • 10
  • 8
    Your implementation is `O(n**2)`, `Counter` is `O(n)`. – jonrsharpe Apr 21 '22 at 14:04
  • 3
    `Counter` makes one pass through the list, updating counts as it goes. Your implementation makes a full pass through the entire list for *every* element of the list. – chepner Apr 21 '22 at 14:07
  • 1
    It's not really about the list comprehension, but the use of `nums.count`. – chepner Apr 21 '22 at 14:10

1 Answers1

4

The code with Counter is roughly equivalent to:

def counter_v1(seq):
    counts = {}
    for x in seq:
        counts[x] = counts.get(x, 0) + 1
    return counts

The code with .count is roughly equivalent to:

def count(seq, x):
    c = 0
    for y in seq:
        if y == x:
            c += 1
    return c

def counter_v2(seq):
    counts = {}
    for x in set(seq):
        counts[x] = count(seq, x)
    return counts

As you can see, function counter_v1 iterates through the sequence once, with a single for-loop. By contrast, function counter_v2 iterates through the sequence once per distinct element, with a for-loop inside another for-loop.

The runtime of counter_v1 will be roughly proportional to len(seq), whereas the runtime of counter_v2 will be roughly proportional to len(seq) * len(set(seq)), which is usually much bigger.

Stef
  • 13,242
  • 2
  • 17
  • 28