I have a string (or a list of words). I would like to create tuples of every possible word pair combination in order to pass them to a Counter for dictionary creation and frequency calculation. The frequency is calculated in the following manner: if the pair exists in a string (regardless of the order or if there are any other words between them) the frequency = 1 (even the word1 has a frequency of 7 and word2 of 3 the frequency of a pair word1 and word2 is still 1)
I am using loops to create tuples of all pairs but got stuck
tweetList = ('I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money from work', 'We went to get our car but the car was not ready. We tried to expedite our car but were told it is not ready')
words = set(tweetList.split())
n = 10
for tweet in tweetList:
for word1 in words:
for word2 in words:
pairW = [(word1, word2)]
c1 = Counter(pairW for pairW in tweet)
c1.most_common(n)
However, the ouput is very bizzare:
[('k', 1)]
It seems instead of words it is iterating over letters
How can this be addressed? Converting a string into a list of words using split() ?
Another question: how to avoid creating duplicate tuples such as: (word1, word2) and (word2, word1)? Enumerate?
As an Output I expect a dictionary where key = all word pairs (see duplicate comment though), and the value = frequency of a pair in the list
Thank you!