Highest Voted 'fuzzywuzzy' Questions

2

votes

0 answers

How to efficiently get the top 3 similar strings in a distance matrix using only one triangle section?

Consider the following Python code: from scipy.spatial.distance import pdist, squareform from fuzzywuzzy import fuzz import pandas as pd words = pd.DataFrame({'Words': ['horse', 'dog', 'food', 'hhorse', 'doggy']}) distance_matr = pdist(words,…

asked Sep 08 '21 at 21:56

Snowflake

2,869
3
22
44

2

votes

1 answer

Python: fuzzywuzzy, the output of the first value is correct, the others are NaN

I'm stuck in a very strange problem: I have two dfs and I have to match strings of one df with the strings of the other df, by similarity. The target column is the name of the television program (program_name_1 & program_name_2). In order to let him…

python pandas dataframe nan fuzzywuzzy

asked Aug 06 '21 at 11:15

Laga

23
5

2

votes

2 answers

How to combine the fuzzy function with apply(lambda x: ) function?

I have 2 dataframes df1 and df2 like this: df1: Id Name 1 Tuy Hòa 2 Kiến thụy 3 Bình Tân df2: code name A1 Tuy Hoà A2 Kiến Thụy A3 Tân Bình Now when I use merge: out_df = pd.merge(df1, df2,…

python pandas lambda apply fuzzywuzzy

asked Jun 23 '21 at 07:38

Tung Nguyen

410
3
11

2

votes

2 answers

Get most similar value from dataframe column to specific string python

I want to find the most similar value from a dataframe column to a specified string , e.g. a='book'. Let's say the dataframe looks like: df col1 wijk 00 book Wijk a test Now I want to return wijk 00 book since this is the most similar to a. I am…

python pandas fuzzywuzzy

asked Apr 26 '21 at 16:01

baqm

121
6

2

votes

1 answer

pandas: calculate fuzzywuzzy for each category separately

I have a dataset as follows, only with more rows: import pandas as pd data = {'First': ['First value','Third value','Second value','First value','Third value','Second value'], 'Second': ['the old man is here','the young girl is there', 'the old…

python-3.x pandas average categories fuzzywuzzy

asked Dec 16 '20 at 14:12

zara kolagar

881
3
15

2

votes

1 answer

How do I get additional column name information in a pandas group by / nlargest calculation?

I am comparing pairs of strings using six fuzzywuzzy ratios, and I need to output the top three scores for each pair. This line does the job: final2_df = final_df[['nameHiringOrganization', 'mesure', 'name',…

python-3.x pandas pandas-groupby fuzzywuzzy

asked Jun 05 '20 at 07:56

davidv

71
7

2

votes

1 answer

Python FuzzyWuzzy ratio: how does it work?

Inside the FuzzyWuzzy ratio description it says: The FuzzyWuzzy ratio raw score is a measure of the strings similarity as an int in the range [0, 100]. For two strings X and Y, the score is defined by int(round((2.0 * M / T) * 100)) where T is the…

python fuzzywuzzy fuzzy

asked Jun 01 '20 at 23:00

s900n

3,115
5
27
35

2

votes

1 answer

Group by fuzzy string matches with fuzzywuzzy and groupby

I have a dataset of random words and names and I am trying to group all of the similar words and names. So given the dataframe below: Name ID Value 0 James 1 10 1 James 2 2 …

python pandas fuzzywuzzy

asked May 26 '20 at 15:29

DrakeMurdoch

765
11
26

2

votes

1 answer

python fuzzywuzzy fuzzy matching - exclude terms

I am fairly new to python, have been using fuzzywuzzy to do some fuzzy matching with success. I am wondering, however, if there is way to exclude terms from the algorithm? Generic terms can often be matched to a ton of options, and I would like to…

python pandas fuzzywuzzy

asked Apr 08 '20 at 19:02

Patrick Williams

35
2

2

votes

1 answer

Basic question - iterating through pandas dataframe column using a function

I am struggling with the basics. I have just one column with names in pandas dataframe and I want to compare strings for potential duplicates using 3-4 functions from fuzzywuzzy library. So first name I want to check against the rest of the column…

python pandas fuzzywuzzy

asked Mar 01 '20 at 14:02

cnns

151
7

2

votes

1 answer

Using fuzzy wuzzy to match names (Issue!) Not performing as expected?

I want to name match appropriately, but as can be seen below it's not the match I wanted is there any way to get around this? I just want Mr Mark Longfield to be preferred over Mr Laurence Boode as it is more likely to be the correct match. from…

python fuzzywuzzy

asked Feb 29 '20 at 14:13

user11357465

2

votes

1 answer

Fuzzy match columns and merge/join dataframes

I am trying to merge 2 dataframes with multiple columns each based on matching values at one of the columns on each of them. This code from @Erfan does a great job fuzzymatching the target columns, but is there a way to carry the rest of columns…

python pandas merge fuzzywuzzy

asked Feb 24 '20 at 16:16

pyproper

53
6

2

votes

1 answer

How to compare row by row in a dataframe

I have a data frame that has a name and the URL ID of the name. For example: Abc 123 Abc.com 123 Def 345 Pqr 123 PQR.com 123 Here due to data extraction error, at times different names have same ID. I want…

python-3.x pandas dataframe group-by fuzzywuzzy

asked Feb 15 '20 at 05:24

asspsss

103
1
1
8

2

votes

1 answer

Fuzzy matching from string candidate list

I've got a list of company names that I am trying to parse from a large number of PDF documents. I've forced the PDFs through Apache Tika to extract the raw text, and I've got the list of 200 companies read in. I'm stuck trying to use some…

python python-3.x spacy apache-tika fuzzywuzzy

asked Jan 29 '20 at 01:28

Jack McPherson

135
1
8

2

votes

1 answer

fuzzy duplicate check using python dedupe library error

I'm trying to use the python dedupe library to perform a fuzzy duplicate check on my mock data, but i keep getting this error: {'Vendor': {0: 'ABC', 1: 'ABC', 2: 'TIM'}, 'Doc Date': {0: '5/12/2019', 1: '5/13/2019', 2: '4/15/2019'}, 'Invoice Date':…

python python-3.x fuzzywuzzy python-dedupe

asked Jan 18 '20 at 21:20

python_rok

61
1
9

Questions tagged [fuzzywuzzy]

Useful links

How to efficiently get the top 3 similar strings in a distance matrix using only one triangle section?

Python: fuzzywuzzy, the output of the first value is correct, the others are NaN

How to combine the fuzzy function with apply(lambda x: ) function?

Get most similar value from dataframe column to specific string python

pandas: calculate fuzzywuzzy for each category separately

How do I get additional column name information in a pandas group by / nlargest calculation?

Python FuzzyWuzzy ratio: how does it work?

Group by fuzzy string matches with fuzzywuzzy and groupby

python fuzzywuzzy fuzzy matching - exclude terms

Basic question - iterating through pandas dataframe column using a function

Using fuzzy wuzzy to match names (Issue!) Not performing as expected?

Fuzzy match columns and merge/join dataframes

How to compare row by row in a dataframe

Fuzzy matching from string candidate list

fuzzy duplicate check using python dedupe library error