9

I have a list of lists, which looks like

listOfLists = [
    ['a','b','c','d'],
    ['a','b'],
    ['a','c'],
    ['c','c','c','c']  
 ] 

I want to count the number of lists which have a particular element. For Example, my output should be

{'a':3,'b':2,'c':3,'d':1}

As you can see, I don't need the total count of an element. In the case of "c", though its total count is 5, the output is 3 as it occurs only in 3 lists.

I am using a counter to get the counts. The same can be seen below.

line_count_tags = []
for lists in lists_of_lists:
    s = set()
    for element in lists:
         s.add(t)
    lines_count_tags.append(list(s))

count = Counter([count for counts in lines_count_tags for count in counts])

So, when I print count, I get

{'a':3,'c':3,'b':2,'d':1}

I want to know if there's a much better way to accomplish my goal.

Mat
  • 202,337
  • 40
  • 393
  • 406
N_B
  • 301
  • 2
  • 15

7 Answers7

12

Use a Counter and convert each list to a set. The set will remove any duplicates from each list so that you don't count duplicate values in the same list:

>>> from collections import Counter

>>> Counter(item for lst in listOfLists for item in set(lst))
Counter({'a': 3, 'b': 2, 'c': 3, 'd': 1})

If you like functional programming you can also feed a chain of set-mapped listOfLists to the Counter:

>>> from collections import Counter
>>> from itertools import chain

>>> Counter(chain.from_iterable(map(set, listOfLists)))
Counter({'a': 3, 'b': 2, 'c': 3, 'd': 1})

Which is totally equivalent (except maybe being a bit faster) to the first approach.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
9

I would convert each list as a set before counting in a generator comprehension passed to Counter:

import collections
print(collections.Counter(y for x in listOfLists for y in set(x)))

result:

Counter({'a': 3, 'c': 3, 'b': 2, 'd': 1})

(that's practically what you did, but the above code shorts a lot of loops and temporary list creations)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
7

You can do it without a Counter, too:

result = {}
for lis in listOfLists:
    for element in set(lis):
        result[element] = result.get(element, 0) + 1
print result  # {'a': 3, 'c': 3, 'b': 2, 'd': 1}

Not the most elegant, but should be considerably faster.

zwer
  • 24,943
  • 3
  • 48
  • 66
5

A bit of a stylistic difference on the Counter approach with itertools.chain.from_iterable may look like

Counter(chain.from_iterable(map(set, listOfLists)))

Demo

>>> from itertools import chain
>>> from collections import Counter
>>> Counter(chain.from_iterable(map(set, listOfLists)))
Counter({'a': 3, 'b': 2, 'c': 3, 'd': 1})

Rough benchmark

%timeit Counter(item for lst in listOfLists for item in set(lst))
100000 loops, best of 3: 13.5 µs per loop

%timeit Counter(chain.from_iterable(map(set, listOfLists)))
100000 loops, best of 3: 12.4 µs per loop
miradulo
  • 28,857
  • 6
  • 80
  • 93
  • I get much faster execution using `itertools.chain` (~40%!) on CPython 2.7.11. Still, `Counter` + `itertools.chain` execute 4 times slower than the `raw` method I presented. – zwer Feb 17 '17 at 21:30
  • 1
    @zwer Eh, depends what input size we are discussing. My solution has more overhead, but if you increase the input size it shall be faster. That's why the benchmarking isn't all too important :) – miradulo Feb 17 '17 at 21:36
  • True that, I was just surprised at the stark difference in speed at my place, I'm not used to `itertools` actually outperforming, well, pretty much anything - they are usually the slower, but easier to read choice :D – zwer Feb 17 '17 at 21:40
3

Just convert to set, flatten using itertools.chain.from_iterable and then feed into a Counter.

from collections import Counter
from itertools import chain

inp = [
    ['a','b','c','d'],
    ['a','b'],
    ['a','c'],
    ['c','c','c','c']  
 ] 


print(Counter(chain.from_iterable(map(set, inp))))
Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
2

This approach calculates the unique entries in listOfLists using set comprehension, and then counts occurrences in each list using dictionary comprehension

A = {val for s in listOfLists for val in s}
d = {i: sum( i in j for j in listOfLists) for i in A}
print(d) # {'a': 3, 'c': 3, 'b': 2, 'd': 1}

I'll admit it's a little ugly, but it's a possible solution (and a cool use of dictionary comprehension). You could also make this a one-liner by moving the calculation of A right into the dictionary comprehension

nbryans
  • 1,507
  • 17
  • 24
  • there is no need to cast your set `A` to a list again or feed the set with a list comprehension, a generation expression is better... actually you can build `A` as a set comprehension too – Copperfield Feb 17 '17 at 21:17
  • @Copperfield Thanks for your suggestion. I've made a change. – nbryans Feb 17 '17 at 21:20
2

Here is another version using loops:

listOfLists = [
    ['a','b','c','d'],
    ['a','b'],
    ['a','c'],
    ['c','c','c','c']
    ]

final = {}
for lst in listOfLists:
    for letter in lst:
        if letter in final:
            final[letter] += 1
        else:
            final[letter] = 1

So create an empty dictionary called final. Then loop through each letter of each list. Create a new key and value = 1 if the letter does not yet exist in final as a key. Otherwise add 1 to the value for that key.

kjmerf
  • 4,275
  • 3
  • 21
  • 29