Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
7
votes
2 answers

Name Matching in python

We have a third party 'tool' which finds similar names and assigns a similarity score between two names. I am supposed to mimic the tool's behavior as closely as possible. After searching over internet, gave a shot at distance method.Used fuzzywuzzy…
Soumya
  • 885
  • 3
  • 14
  • 29
7
votes
2 answers

Improve fuzzywuzzy - Matching names in 2 lists

My requirement is to find matching names for 2 list. One list has 400 names and second list has 90000 names. I got the desired result but process takes more than 35 mins. As it is obvious , there are 2 for loops so it takes O(N*N) operations which…
ashwin3086
  • 136
  • 1
  • 2
  • 8
6
votes
4 answers

How to vectorize and speed-up double for-loop for pandas dataframe when doing text similarity scoring

I have the following dataframe: d_test = { 'name' : ['South Beach', 'Dog', 'Bird', 'Ant', 'Big Dog', 'Beach', 'Dear', 'Cat'], 'cluster_number' : [1, 2, 3, 3, 2, 1, 4, 2] } df_test = pd.DataFrame(d_test) I want to identify similar names in…
illuminato
  • 1,057
  • 1
  • 11
  • 33
6
votes
2 answers

Levenshtein distance giving strange values

Here's a string T: 'men shirt team brienne funny sarcasm shirt features graphic tees mugs babywear much real passion brilliant design detailed illustration strong appreciation things creative br shop thousands designs found across different shirt…
user9343456
  • 351
  • 2
  • 11
6
votes
1 answer

Need more understanding on python fuzz partial ratio

I am using python fuzzywuzzy on an enterprise level to match 2 strings. It works fine in most of the cases but giving unexpected results in the below mentioned scenario: fuzz.partial_ratio('ja rule:mesmerize','ja rule feat. ashanti:mesmerize') gives…
Sains
  • 457
  • 1
  • 7
  • 19
6
votes
1 answer

Python group similar records (strings) in dataset

I have an input table like this: In [182]: data_set Out[182]: name ID 0 stackoverflow 123 1 stikoverflow 322 2 stack, overflow 411 3 internet.com 531 4 internet 112 …
Dio
  • 97
  • 1
  • 8
6
votes
2 answers

fastest way to do fuzzy matching two strings in pandas data frame

I have two data frames with name list df1[name] -> number of rows 3000 df2[name] -> number of rows 64000 I am using fuzzy wuzzy to get the best match for df1 entries from df2 using the following code: from fuzzywuzzy import fuzz from…
6
votes
3 answers

Pandas fuzzy merge/match name column, with duplicates

I have two dataframes currently, one for donors and one for fundraisers. I'm trying to find if any fundraisers also gave donations, and if so, copy some of that information into my fundraiser dataset (donor name, email and their first donation).…
Wizuriel
  • 3,617
  • 4
  • 21
  • 26
5
votes
3 answers

Fuzzy Match columns of Different Dataframe

Background I have 2 data frames which has no common key to which I can merge them. Both df have a column that contains "entity name". One df contains 8000+ entities and the other close to 2000 entities. Sample Data: vendor_df= Name of Vendor …
Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
5
votes
1 answer

Performing a fuzzy contains check

I would like to check if a keyword string is contained within a text string. This must be a fuzzy contains. My first attempt was to use the library fuzzywuzzy. This seemed to have unexpected behavior producing high match values when the strings…
Michael
  • 3,411
  • 4
  • 25
  • 56
5
votes
1 answer

string comparison for multiple values python

I have sets of data. The first (A) is a list of equipment with sophisticated names. The second is a list of more broad equipment categories (B) - to which I have to group the first list into using string comparisons. I'm aware this won't be…
MacAnRiogh
  • 75
  • 6
5
votes
1 answer

sklearn: Would like to extend CountVectorizer to fuzzy match against vocabulary

I was going to try using fuzzywuzzy with a tuned acceptable score parameter essentially it would check if the word is in the vocabulary as-is, and if not, it would ask fuzzywuzzy to choose the best fuzzy match, and accept that for the list of tokens…
KotoroShinoto
  • 720
  • 1
  • 5
  • 9
5
votes
1 answer

Python fuzzy matching of names with only first initials

I have a case where I need to match a name from a given string to a database of names. Below I have given a very simple example of the issue that I am running into, and I am unclear as to why one case works over the other? If I'm not mistaken, the…
rahlf23
  • 8,869
  • 4
  • 24
  • 54
5
votes
1 answer

Fuzzy Wuzzy String Matching on 2 Large Data Sets Based on a Condition - python

I have 2 large data sets that I have read into Pandas DataFrames (~ 20K rows and ~40K rows respectively). When I try merging these two DFs outright using pandas.merge on the address field, I get a paltry number of match compared to the number of…
Nirav
  • 53
  • 1
  • 1
  • 6
5
votes
3 answers

Searching one Python dataframe / dictionary for fuzzy matches in another dataframe

I have the following pandas dataframe with 50,000 unique rows and 20 columns (included is a snippet of the relevant columns): df1: PRODUCT_ID PRODUCT_DESCRIPTION 0 165985858958 "Fish Burger with Lettuce" 1 …
gincard
  • 1,814
  • 3
  • 16
  • 24
1
2
3
34 35