I'm trying to filter on a specific value in pandas in a column but also allow for typing mistakes. I thought using SequenceMatcher was a good solution but I don't know what the best way is to apply it within a DataFrame. Let's say the headers are 'number' and 'location'.
df1 = [[1, Amsterdam], [2, amsterdam], [3, rotterdam], [4, amstrdam], [5, Berlin]]
If I want to filter on 'amsterdam' with a certain ratio, let's say 0.6. The output probably would be like this.
df1 = [[1, Amsterdam], [2, amsterdam], [4, amstrdam]]
What would be the best way to get this done? I was thinking about using an filter option but that didn't work. Do I need to first run an apply function to add a column with the ratio and then be able to filter on it? Or is there a smarter way?
df2 = df1[SequenceMatcher(None, location, df1.location).ratio() > 0.6]