so I'm new to Python and I was looking to remove partially similar entries within the same column. For example these are the entries in one of the columns in a dataframe-
Row 1 - "I have your Body Wash and I wonder if it contains animal ingredients. Also, which animal ingredients? I prefer not to use product with animal ingredients."
Row 2 - "This also doesn't have the ADA on there. Is this a fake toothpaste an imitation of yours?"
Row 3 - "I have your Body Wash and I wonder if it contains animal ingredients. I prefer not to use product with animal ingredients."
Row 4 - "I didn't see the ADA stamp on this box. I just want to make sure it was still safe to use?"
Row 5 - "Hello, I was just wondering if the new toothpaste is ADA approved? It doesn’t say on the packaging"
Row 6 - "Hello, I was just wondering if the new toothpaste is ADA approved? It doesn’t say on the box."
So in this column, rows 1&3, and rows 5&6 are similar (partial duplicates). I want python to recognize these as duplicates, retain the longer sentence and drop the shorter one and export the new data to a csv file.
Expected output - Row 1 - "I have your Body Wash and I wonder if it contains animal ingredients. Also, which animal ingredients? I prefer not to use product with animal ingredients."
Row 2 - "This also doesn't have the ADA on there. Is this a fake toothpaste an imitation of yours?"
Row 3 - "I didn't see the ADA stamp on this box. I just want to make sure it was still safe to use?"
Row 4 - "Hello, I was just wondering if the new toothpaste is ADA approved? It doesn’t say on the packaging"
I tried using FuzzyWuzzy wherein I used the similarity sort function, but it didn't give me the expected output. is there any simpler code for this?