Find csv lines by word similarity

Question

I have a csv file with thousands of lines. I would like to retrieve only the lines with some similarity regarding a specific word. In this case I am expecting to catch the line 1, 2 and 4.

Any idea how to achieve that?

import csv
a='Microsoft'
f = open("testing.csv")
reader = csv.reader(f, delimiter='\n')

for row in reader:
    if a in row[0]:
        print row[0]

testing.csv

I like very much the Microsoft products
Me too, I like Micrsoft
I prefer Apple products
microfte here

Is that really the `csv` file? How come it's not separated by commas — hqkhan, Dec 21 '18 at 15:19

cody · Accepted Answer · 2018-12-21T15:38:53.747

The fuzzywuzzy library is suitable for this. Given your test data and expected results I'm assuming case does not matter, so I am uppercasing both the word to compare against and the test data:

from fuzzywuzzy import fuzz
import csv

word = 'Microsoft'.upper()

f = open('testing.csv')
reader = csv.reader(f, delimiter='\n')

for row in reader:
    a = row[0].split(' ')
    if max([fuzz.ratio(word, x.upper()) for x in a]) > 80:
        print(row[0])

Result:

$ python test.py
I like very much the Microsoft products
Me too, I like Micrsoft
microfte here

Find csv lines by word similarity

1 Answers1