1

I am hoping I am not creating a duplicate lol, but I spend more than hours looking for something similar to my questions :)

Said that, I have the following input:

foo= {"Brand":["loc doc poc",
               "roc top mop",
               "loc lot not",
               "roc lot tot",
               "loc bot sot",
               "nap rat sat"] }

word_list=["loc","top","lot"]
df=pd.DataFrame(foo) 

2 Desired Outputs

1 Dictionary with the occurrences stored

2 New column containing the number of occurrences for each row

#Outputs: 
counter_dic={"loc":3,"top":1,"lot":2}

            Brand   count
0   loc  doc  poc       1
1   roc  top  mop       1
2   loc  lot  not       2
3   roc  lot  tot       1
4   toc  bot  sot       1
5   nap  rat  sat       0

The only idea that I had:

  • Count how many times a set of terms occurs. I can create a bag of words and then filtering based on the dictionary keys?

If you find a similar question, this can be closed obviously.

I checked the following ones

This one of the most similar

Check If a String Is In A Pandas DataFrame

Python Lists Finding The Number Of Times A String Occurs

Count Occurrences Of A Substring In A List Of Strings

Andrea Ciufo
  • 359
  • 1
  • 3
  • 19

2 Answers2

1

Here is one potential solution using str.count to create an interim count DataFrame which will help with both outputs.

df_counts = pd.concat([df['Brand'].str.count(x).rename(x) for x in word_list], axis=1)

Looks like:

   loc  top  lot
0    1    0    0
1    0    1    0
2    1    0    1
3    0    0    1
4    1    0    0
5    0    0    0

1 - Dictionary with the occurrences stored

df_counts.sum().to_dict()

[out]

{'loc': 3, 'top': 1, 'lot': 2}

2 - New column containing the number of occurrences for each row

df['count'] = df_counts.sum(axis=1)

[out]

         Brand  count
0  loc doc poc      1
1  roc top mop      1
2  loc lot not      2
3  roc lot tot      1
4  loc bot sot      1
5  nap rat sat      0
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
  • I noticed that in my list of words one of these is `+watt` and it generates and error: `error: nothing to repeat at position 0` I was thinking to replace this special character with `.replace("+","")` any better suggestion to deal with this kind of exceptions? – Andrea Ciufo Mar 07 '21 at 16:41
1

Here is a way to get the count into dictionary form:

df['Brand'].str.split(' ').explode().to_frame('Brand').groupby('Brand').size().loc[word_list].to_dict()

Here is a way to get the count:

df['count'] = df['Brand'].str.get_dummies(sep=' ').loc[:,word_list].sum(axis=1)
rhug123
  • 7,893
  • 1
  • 9
  • 24