0

I'd appreciate if anyone could help me with the following:

I'd like to match a longer list of strings (92 elements) with each element's closest match from a shorter list of strings (26 elements). Using thefuzz package in the opposite direction (comparing the shorter list to the longer list) is possible but not the other way around.

What's working uses this code taken from this example and runs as follows (where difference_station1 is a len of 92 and difference_station2 has a len of 26):

# set threshold
threshold = 20

# empty list
response = []

# iterate through lists and outputs df with name, matched name and score
for name_to_find in difference_stations2:
    resp_match =  process.extractOne(name_to_find ,difference_stations1)
    if resp_match[1] > threshold:
         row = {'name':name_to_find,'match_name':resp_match[0], 'score':resp_match[1]}
         response.append(row)
         print(row)

results = pd.DataFrame(response)
results
name matched_name score
10th & Quincy St NE / Turkey Thicket Rec 7th & T St NW 86
Loughboro Rd & Dalecarlia Pkwy NW / Sibley Hos... 7th & T St NW 86
South Capitol St and Southern Ave SE 7th & T St NW 86
4th & O St SW N Moore St & Rosslyn Metro 86
Ridge Rd & Southern Ave SE Meridian High School / Haycock Rd & Leesburg Pike 86
41st St & Alabama Ave SE / Fort Davis Rec 7th & T St NW 86
9th & G St NW 19th & K St NW 89
... ... ...

This table matches all 26 of difference_station2 elements to an element in difference_station1.

But I want this result where the opposite matching is done where the output matches all 92 of difference_station1 elements to an element in difference_station2:

name matched_name score
7th & T St NW closest match x
N Moore St & Rosslyn Metro closest match x
Meridian High School / Haycock Rd & Leesburg Pike closest match x
19th & K St NW closest match x
... ... ...

Can anyone help me achieve this? THANKS!

jasbur
  • 1

0 Answers0