Given a list / set of labels
labels = {'rectangle', 'square', 'triangle', 'cube'}
and a dataframe df,
df = pd.DataFrame(['rectangle rectangle in my square cube', 'triangle circle not here', 'nothing here'], columns=['text'])
I want to know how many times each word in my set of labels occurred in the text column of the dataframe and create a new column which has the top X (maybe 2 or 3) most repeated words. If 2 words are repeated equally as much then they can appear in a list or string
Output:
pd.DataFrame({'text' : ['rectangle rectangle in my square cube', 'triangle circle not here', 'nothing here'], 'best_labels' : [{'rectangle' : 2, 'square' : 1, 'cube' : 1}, {'triangle' : 1, 'circle' : 1}, np.nan]})
df['best_labels'] = some_function(df.text)