Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
1
vote
1 answer

How to match string and arrange dataframe accordingly?

Got Input df1 and df2 df1: Subcategory_Desc Segment_Desc Flow Side Row_no APPLE APPLE LOOSE Apple Kanzi Front Row 1 APPLE APPLE LOOSE Apple Jazz Front Row 1 CITRUS …
user12345
  • 499
  • 1
  • 5
  • 21
1
vote
0 answers

Memory Error if Dataframe Rows are too many

Here is the sample dataframe: 111853 \t Authentic Restaurant 108660 \tBone Jam 57176 \tBurgers and Barrels 77583 \tDelice de France @ Bonne Bouche -…
Tenserflu
  • 520
  • 5
  • 20
1
vote
1 answer

Remove all different string on dataframe using fuzzywuzzy

I want to remove all different string from a dataframe and retain all "similar" string. For example, I have this data: store_name ------------ Mcdonalds KFC Burger King Mcdonald's Mcdo Taco bell The store that we need to compare above is the first…
Tenserflu
  • 520
  • 5
  • 20
1
vote
1 answer

Fastest way to detect and append duplicates base on specific column in dataframe

Here are samples data: name age gender school Michael Z 21 Male Lasalle Lisa M 22 Female Ateneo James T 21 Male UP Michael Z. 23 Male TUP Here are the expected…
Tenserflu
  • 520
  • 5
  • 20
1
vote
1 answer

Merging dataframes based on fuzzy logic matching using fuzzywuzzy pandas

I have 2 dataframes one dataframe(df1) contains columns like- ISIN, Name, Currency, Value, % Weight, Asset type., comments and assumptions So this dataframe looks like this:- df1 ISIN Name Currency Value %…
technophile_3
  • 531
  • 6
  • 21
1
vote
1 answer

Efficient way to find an approximate string match and replacing with predefined string

I need to build a NER system (Named Entity Recognition). For simplicity, I am doing it by using approximate string matching as input can contain typos and other minor modifications. I have come across some great libraries like: fuzzywuzzy or even…
hafiz031
  • 2,236
  • 3
  • 26
  • 48
1
vote
1 answer

Similarity score to compare all strings in column to first string using fuzzywuzzy

I have a dataset containing time-series of lists for a large number of objects (unit) and I need to compare, for each object, the lists to the first list for each object. To do so, I have been using fuzzywuzzy and its similarity method, but I don't…
1
vote
1 answer

Parallelize for loop in pd.concat

I need to merge two large datasets based on string columns which don't perfectly match. I have wide datasets which can help me determine the best match more accurately than string distance alone, but I first need to return several 'top matches' for…
Chad S
  • 53
  • 6
1
vote
1 answer

Using FuzzyWuzzy with pandas

I am trying to calculate the similarity between cities in my dataframe, and 1 static city name. (eventually I want to iterate through a dataframe and choose the best matching city name from that data frame, but I am testing my code on this…
1
vote
0 answers

Understanding the distance metric in company name matching using KNN

I am trying to understand the following code that I found for matching a messy list of company names to a list of clean list of company names. My question is what the 'Ratio' metric is calculated using. It appears that the ratio is from scorer =…
JSC
  • 181
  • 2
  • 12
1
vote
2 answers

Fuzzy match for 2 lists with very similar names

I know this question has been asked in some way so apologies. I'm trying to fuzzy match list 1(sample_name) to list 2 (actual_name). Actual_name has significantly more names than list 1 and I keep runninng into fuzzy match not working well. I've…
1
vote
2 answers

Fuzzywuzzy merge on multiple columns - pandas

I've 2 dataframes: Dataframe 1: path hierarchy 0 path3 path1/path2/path3 1 path2 path1/path2 2 path6 path1/path2/path4/path5/path6 DataFrame 2: path hierarcy …
Shubham Sharma
  • 129
  • 1
  • 8
1
vote
2 answers

Is there a way to modify this code to reduce run time?

so I am looking to modify this code to reduce runtime of fuzzywuzzy library. At present, it's taking about an hour for a dataset with 800 rows, and when I used this on a dataset with 4.5K rows, it kept running for almost 6 hours, still no result. I…
1
vote
1 answer

fuzzy wuzzy to find a match and other columns associated with match

I have a dataset that I'd like to match on address and then once I have the address match Id like to also know the related unique id associated with it. Consider this example: df1 = Address 123 road abc lane 1 circle 7th avenue 4 high…
aero8991
  • 239
  • 1
  • 13
1
vote
1 answer

how to sort dataframe2 according to dataframe1 with fuzzywuzzy

I know this is old question in fact i have seen many links related to my question: Using fuzzywuzzy to create a column of matched results in the data frame How to compare a value in one dataframe to a column in another using fuzzywuzzy ratio What's…
Titan
  • 244
  • 1
  • 4
  • 18