I have a NLP project where I would like to remove the words that appear only once in the keywords. That is to say, for each row I have a list of keywords and their frequencies.
I would like something like
if the frequency for the word in the whole column ['keywords'] ==1 then replace by "".
I cannot test word by word. So my idea was creating a list with all the words and remove the duplicates, then for each word in this list count.sum and then delete. But I have no idea how to do that. Any ideas? Thanks!
Here's how my data looks like:
sample.head(4)
ID keywords age sex
0 1 fibre:16;quoi:1;dangers:1;combien:1;hightech:1... 62 F
1 2 restaurant:1;marrakech.shtml:1 35 M
2 3 payer:1;faq:1;taxe:1;habitation:1;macron:1;qui... 45 F
3 4 rigaud:3;laurent:3;photo:11;profile:8;photopro... 46 F