2

I have a dataframe which contains reviews, as well as two lists, one which stores nouns and the other storing verbs/adjectives.

Example code:

import pandas as pd

data = {'reviews':['Very professional operation. Room is very clean and comfortable',
                    'Daniel is the most amazing host! His place is extremely clean, and he provides everything you could possibly want (comfy bed, guidebooks & maps, mini-fridge, towels, even toiletries). He is extremely friendly and helpful.',
                    'The room is very quiet, and well decorated, very clean.',
                    'He provides the room with towels, tea, coffee and a wardrobe.',
                    'Daniel is a great host. Always recomendable.',
                    'My friend and I were very satisfied with our stay in his apartment.']}

df = pd.DataFrame(data)
nouns = ['place','Amsterdam','apartment','location','host','stay','city','room','everything','time','house',
         'area','home','’','center','restaurants','centre','Great','tram','très','minutes','walk','space','neighborhood',
         'à','station','bed','experience','hosts','Thank','bien']

verbs_adj = ['was','is','great','nice','had','clean','were','recommend','stay','are','good','perfect','comfortable',
             'have','easy','be','quiet','helpful','get','beautiful',"'s",'has','est','located','un','amazing','wonderful',]

Using the dataframe and two lists, how can I create a function which returns a dictionary of dictionaries of the co-occurrences of verbs and adjectives for the nouns in each review? My ideal output would be:

Example review: 'A big restaurant served delicious food in big dishes'

>>> {‘restaurant’: {‘big’: 2, ‘served’:1, ‘delicious’:1}}
RDTJr
  • 185
  • 1
  • 9

1 Answers1

1

You could try this:

from collections import Counter
from copy import deepcopy
from pprint import pprint

data = ...
nouns = ...
verbs_adj = ...

def count_co_occurences(reviews):
    # Iterate on each review and count
    occurences_per_review = {
        f"review_{i+1}": {
            noun: dict(Counter(review.lower().split(" ")))
            for noun in nouns
            if noun in review.lower()
        }
        for i, review in enumerate(reviews)
    }
    # Remove verb_adj not found in main list
    opr = deepcopy(occurences_per_review)
    for review, occurences in opr.items():
        for noun, counts in occurences.items():
            for verb_adj in counts.keys():
                if verb_adj not in verbs_adj:
                    del occurences_per_review[review][noun][verb_adj]
    return occurences_per_review


pprint(count_co_occurences(data["reviews"]))
# Outputs
{'review_1': {'room': {'clean': 1, 'comfortable': 1, 'is': 1}},
 'review_2': {'bed': {'amazing': 1, 'is': 3},       
              'everything': {'amazing': 1, 'is': 3},
              'host': {'amazing': 1, 'is': 3},      
              'place': {'amazing': 1, 'is': 3}},    
 'review_3': {'room': {'is': 1}},
 'review_4': {'room': {}},
 'review_5': {'host': {'great': 1, 'is': 1}},       
 'review_6': {'apartment': {'stay': 1, 'were': 1},  
              'stay': {'stay': 1, 'were': 1}}} 

Laurent
  • 12,287
  • 7
  • 21
  • 37
  • I've tried your code but it's just crashing, I think it may be because I have a lot of reviews in the data I'm using? – RDTJr May 28 '21 at 11:32
  • Hard to tell. It works fine with the data you provided, try it first (although, in `data`, you miss a comma after the review ending by the word 'helpful', you should correct your post). – Laurent May 28 '21 at 13:06
  • Ah thank you, I've corrected that mistake. Your code works fine when using the example data, but it can't seem to handle the number of reviews in my dataframe or the number of words in my lists. Thank you for your help though! – RDTJr May 28 '21 at 14:10
  • It must be a very big list, then. You should post another question, as it not related to this one. Also, please consider accepting this answer by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. In any case, have a nice day. – Laurent May 28 '21 at 14:16