I have two lists of string
:
(Pdb) word_list1
['first', 'sentence', 'ant', 'first', 'whatever']
(Pdb) word_list2
['second', 'second', 'heck', 'anything', 'youtube', 'gmail', 'hotmail']
I want to compute the probability distribution of the union of words for each of the two sets for each word.
(Pdb) print list(set(word_list1) | set(word_list2))
['hotmail', 'anything', 'sentence', 'maybe', 'youtube', 'whatever', 'ant', 'second', 'heck', 'gmail', 'first']
(Pdb) len(list(set(word_list1) | set(word_list2)))
11
So, I want two vectors of length 11, one for each wordlist.