Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
1
vote
0 answers

Fuzzy Matching in PySpark Failed : TypeError: can't pickle _thread.lock objects

I have two Redshift Tables each containing a column which stores email_addresses. I am using the FuzzyWuzzy library for String Matching between these two columns. I have read about FuzzyWuzzy here and the PySpark logic here. This is my FuzzyWuzzy…
ab_padfoot
  • 63
  • 1
  • 10
1
vote
1 answer

Pandas - change next row on single column based on the fuzzy wuzzy result of comparing row[i] with row[i+1]

I have the next DataFrame(df) in pandas: (This is just an example the real DF is more than 2000 rows and more than 20 names) ID Name 1 Andrea Gonzlez 2 Andrea Glz 3 Andrea Glez 4 Lineth Arce 5 lineth a 6 lineth aerc I want to…
Init5 God
  • 11
  • 2
1
vote
0 answers

Fuzzy match string with 1 million rows

I have a database with 1 million rows and based on a user's input I need to find him the most relevant matches. The way the code was written in the past was by using the library fuzzywuzzy. A ratio between 2 strings was calculated in order to show…
Cristian Gira
  • 113
  • 2
  • 8
1
vote
1 answer

optimizing RapidFuzz for a list with large number of elements (e.g. 200,000)

I would like to run this piece of rapidfuzz code mentioned in this post on a list with 200,000 elements. I am wondering what's the best way to optimize this for a faster run on GPU? Find fuzzy match string in a list with matching string value and…
nerd
  • 473
  • 5
  • 15
1
vote
1 answer

pandas fuzzy match on the same column but prevent matching against itself

This is a common question but I have an extra condition: how do I remove matches based on a unique ID? Or, how to prevent matching against itself? Given a dataframe: df = pd.DataFrame({'id':[1, 2, 3], 'name':['pizza','pizza…
Chuck
  • 1,061
  • 1
  • 20
  • 45
1
vote
1 answer

How to Compare strings of 1 column with strings of another within the same dataframe, calculate the percentage of strings matching in result columns

How to Compare strings of 1 column with strings of another within the same dataframe, calculate the percentage of strings matching in result columns, as well as whether they are full matches, partial matches, or don't match at all?
1
vote
0 answers

Match keywords between two sheets, and inputting the values onto the first sheet

I have one master excel sheet that contains different names of procedures. The observations are different instances in which each variable can occur. The second sheet is one observation in which each variable has a value attached to them. Each one…
NellPaddle
  • 11
  • 2
1
vote
0 answers

Fuzzy Matching beetwen two list with different lenght

after doing web scraping for a project while merging two datasets I realized that some data were not matched because the strings are not exactly the same ( example: Usop = Usopp), so to overcome this problem I am using FuzzyWuzzy library, the…
1
vote
2 answers

drop same values in different columns by pair (drop connected components)

after applying levenshtein distance algorithm I get a dataframe like…
Maximiliano Vazquez
  • 196
  • 1
  • 2
  • 12
1
vote
2 answers

Comparing two strings with low/no consistency

I have two strings a = 'Test - 4567: Controlling_robotic_hand_with_Arduino_uno' b = 'Controlling robotic hand' I need to check if they match and print out the result accordingly. As b is the string I want checked in a, the result should print out…
Rav3H34rt
  • 13
  • 2
1
vote
0 answers

how to fuzzywuzzy match items in dataframe columns a, and merge with table b elements?

Hi I have a table products, and another table product pricing. How would I use the fuzzywuzzy match so that I can find the products and return the similarity score and also add productpricing tables items? tables…
1
vote
1 answer

Fuzzy Matching with different fuzz ratios

I have two large datasets. df1 is about 1m lines, and df2 is about 10m lines. I need to find matches for lines in df1 from df2. I have posted an original version of this question separately. See here. Well answered by @laurent but I have some…
1
vote
1 answer

Optimize the traversal of a column of a dataframe

I want to check for fuzzy duplicates in a column of the dataframe using fuzzywuzzy. In this case, I have to iterate over the rows one by one using two nested for loops. for i in df['col']: for j in df['col']: ratio = fuzz.ratio(i, j) …
Shrmn
  • 368
  • 4
  • 12
1
vote
2 answers

fuzzywuzzy returning single characters, not strings

I'm not sure where I'm going wrong here and why my data is returning wrong. Writing this code to use fuzzywuzzy to clean bad input road names against a list of correct names, replacing the incorrect with the closest match. It's returning all lines…
1
vote
2 answers

How to replace the use of two for's(), a list and a dataframe in python?

I have a dataframe and a string list: import pandas as pd from fuzzywuzzy import fuzz from fuzzywuzzy import process df = pd.DataFrame({'Name': ['PARIS', 'NEW YORK', 'MADRI', 'PARI', 'P ARIS', 'NOW YORK', …
Jane Borges
  • 552
  • 5
  • 14