1

I have two lists of string:

(Pdb) word_list1
['first', 'sentence', 'ant', 'first', 'whatever']
(Pdb) word_list2
['second', 'second', 'heck', 'anything', 'youtube', 'gmail', 'hotmail']

I want to compute the probability distribution of the union of words for each of the two sets for each word.

(Pdb) print list(set(word_list1) | set(word_list2))
['hotmail', 'anything', 'sentence', 'maybe', 'youtube', 'whatever', 'ant', 'second', 'heck', 'gmail', 'first']
(Pdb) len(list(set(word_list1) | set(word_list2)))
11

So, I want two vectors of length 11, one for each wordlist.

Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
  • What is your expected output ? – ZdaR Dec 17 '15 at 10:05
  • I don't know exactly what you mean with "probalbility distribution", but I guess it's sort of a duplicate to [how-can-i-count-the-occurrences-of-a-list-item-in-python](http://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item-in-python) – MarkusN Dec 17 '15 at 10:09

1 Answers1

1

You need more a dictionary with 11 elements as a result, and go for Counter instead of set operations if you are looking for frequencies:

from collections import Counter

n   = len(l1) + len(l2)
dic = dict(Counter(l1) + Counter(l2))

# for the first list
{k:round(v/n,2) if k in l1 else 0 for k,v in dic.iteritems()}

#{'ant': 0.09,
# 'anything': 0,
# 'first': 0.18,
# 'gmail': 0,
# 'heck': 0,
# 'hotmail': 0,
# 'second': 0,
# 'sentence': 0.09,
# 'whatever': 0.09,
# 'youtube': 0}
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87