3

I have a list of lists like :

names =  [['cat', 'fish'], ['cat'], ['fish', 'dog', 'cat'],
 ['cat', 'bird', 'fish'], ['fish', 'bird']]

I want to count number of times that each pair of names mentioned together in the whole list and the output would be like:

{ ['cat', 'fish']: 3, ['cat', 'dog']: 1,['cat','bird']:1
 ['fish','dog'] : 1, ['fish','bird']:2} 

I tried :

from collections import Counter
from collections import defaultdict

co_occurences = defaultdict(Counter)
for tags in names:
    for key in tags:
        co_occurences[key].update(tags)

print co_occurences

but it doesn't count co=occurrences in the main list.

mk_sch
  • 1,060
  • 4
  • 16
  • 31

5 Answers5

3

Here it is. I use <= to test if a set is a subset of another set (sets have no order and each element appears exactly once).

import itertools
from pprint import pprint

names = [['cat', 'fish'],
         ['cat'],
         ['fish', 'dog', 'cat'],
         ['cat', 'bird', 'fish'],
         ['fish', 'bird']]

# Flatten the list and make all names unique
unique_names = set(itertools.chain.from_iterable(names))

# Get all combinations of pairs
all_pairs = list(itertools.combinations(unique_names, 2))

# Create the dictionary
result = {pair: len([x for x in names if set(pair) <= set(x)]) for pair in all_pairs}

pprint(result)

This is the output

{('bird', 'cat'): 1,
 ('bird', 'dog'): 0,
 ('bird', 'fish'): 2,
 ('dog', 'cat'): 1,
 ('fish', 'cat'): 3,
 ('fish', 'dog'): 1}

I would suggest to put this in a dedicated function len([x for x in names if set(pair) <= set(x)]) for the values of the dictionary.

Elmex80s
  • 3,428
  • 1
  • 15
  • 23
  • Great answer! I would suggest using a dedicated function in your answer, since I think it will make that part of the code more understandable. It took me a while to figure out why you were using `<=`, but once I understood, I really liked it. – Sam Mussmann Feb 16 '17 at 11:50
  • @SamMussmann Yes true, will update my answer. I would like to add I did mention to put it in a dedicated function! :-) – Elmex80s Feb 16 '17 at 11:52
3

You may use the desired result via using itertools.combinations and itertools.chain as:

>>> from itertools import combinations, chain

>>> names =  [['cat', 'fish'], ['cat'], ['fish', 'dog', 'cat'],
...  ['cat', 'bird', 'fish'], ['fish', 'bird']]
>>> uniques = set(chain(*names))
>>> {x: sum(1 for n in names if all(i in n for i in x))  for x in combinations(uniques, 2)}
{('fish', 'dog'): 1, ('dog', 'cat'): 1, ('bird', 'fish'): 2, ('fish', 'cat'): 3, ('bird', 'dog'): 0, ('bird', 'cat'): 1}
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
2

You could use the bitwise AND in python and compare them by converting the list of lists to a list of sets

>>> set(['cat','dog']) & set(['cat','dog','monkey','horse','fish'])
set(['dog', 'cat'])

You could use this property and achieve the count you've wanted.

def listOccurences(item, names):
    # item is the list that you want to check, eg. ['cat','fish']
    # names contain the list of list you have.
    set_of_items = set(item) # set(['cat','fish'])
    count = 0
    for value in names:
        if set_of_items & set(value) == set_of_items:
            count+=1
    return count

names =  [['cat', 'fish'], ['cat'], ['fish', 'dog', 'cat'],['cat', 'bird', 'fish'], ['fish', 'bird']]
# Now for each of your possibilities which you can generate
# Chain flattens the list, set removes duplicates, and combinations generates all possible pairs.
permuted_values = list(itertools.combinations(set(itertools.chain.from_iterable(names)), 2))
d = {}
for v in permuted_values:
    d[str(v)] = listOccurences(v, names)
# The key in the dict being a list cannot be possible unless it's converted to a string.
print(d)
# {"['fish', 'dog']": 1, "['cat', 'dog']": 1, "['cat', 'fish']": 3, "['cat', 'bird']": 1, "['fish', 'bird']": 2}
Sam Mussmann
  • 5,883
  • 2
  • 29
  • 43
Sudheesh Singanamalla
  • 2,283
  • 3
  • 19
  • 36
  • It would be better to create `permuted_values` programmatically, I think. Consider using `itertools.chain`, like Elmex80s's answer. – Sam Mussmann Feb 16 '17 at 11:51
  • Sam Mussmann, Completely agree but the intention here was not to solve the permutation issue it's to show the way the counting, I am assuming when the OP asked the way the result had to look the permutation part has already been solved. But yes using itertools to chain and create unique pairs of 2 is definitely possible and recommended. it'd be great if you can make an edit to the answer. – Sudheesh Singanamalla Feb 16 '17 at 11:56
1

First compute all 2 combinations of word and if both terms occur in list items increase its value in result dictionary : ( l is list of different elements in whole list):

from collections import defaultdict
from itertools import combinations, chain


names =  [['cat', 'fish'], ['cat'], ['fish', 'dog', 'cat'],
 ['cat', 'bird', 'fish'], ['fish', 'bird']]
l = set(chain.from_iterable(names)) # {'dog', 'bird', 'cat', 'fish'}
result = defaultdict(int)
for x in (list(combinations(l, 2))):
    for y in names:
        if((x[0] in y) and (x[1] in y)):
            result[x[0],x[1]] += 1


result # defaultdict(<class 'int'>, {('fish', 'bird'): 2, ('cat', 'dog'): 1, ('cat', 'fish'): 3, ('fish', 'dog'): 1, ('cat', 'bird'): 1})
ᴀʀᴍᴀɴ
  • 4,443
  • 8
  • 37
  • 57
0

The solutions listed here didn't work for my large dataset (10s of thousands), they were too slow. The following solution is way faster, it takes a fraction of a second.

Check the Counter class here

https://docs.python.org/2/library/collections.html#collections.Counter

# generate combinations for each sub list seperately
lists_of_pairs = [list(itertools.combinations(sub_list, 2)) for sub_list in names]
# flatten the lists of pairs to 1 large list of pairs
all_pairs = [pair for pairs_list in lists_of_pairs for pair in pairs_list]
# let the Counter do the rest for you
co_occurences_counts = Counter(all_pairs)
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
hatemfaheem
  • 136
  • 1
  • 12