I have a function which counts co-occurrences between center and context words within reviews.
def get_coocs(x):
occurdict={}
# Pre-processing
tokens = nltk.word_tokenize(x)
tokenslower = list(map(str.lower, tokens))
# Save all the nouns in each review
allnouns=[word for word in tokenslower if word in cent_vocab]
# Save all the verbs/adjectives in each review
allverbs_adj=Counter(word for word in tokenslower if word in cont_vocab)
# Creating a dictionary of dictionaries
for noun in allnouns:
occurdict[noun]=dict(allverbs_adj)
return occurdict
coocs=df['comments'].apply(lambda x: get_coocs(x))
My dict of dicts looks like this:
{'host': {'is': 3, 'most': 1, 'amazing': 1},
{'time': {'had': 1, 'such': 1, 'great': 1},
{'room': {'very': 2, 'professional': 1},
{'way': {'is': 3, 'recommended': 1, 'provided': 2}
But when I try and convert it into a dataframe, with nouns as columns and verbs/adjectives as indexes with corresponding co-occurence values I end up with this:
def cooc_dict2df(coocs):
coocdf=pd.DataFrame.from_dict({i:coocs[i] for i in coocs.keys()}, orient='index')
return coocdf
I've attempted other solutions but I still can't seem to get what I want.