Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
0
votes
1 answer

Trying to Perform Fuzzy Matching in Python

I am trying to perform a fuzzywuzzy command comparing two columns in a dataframe. I want to know if a character string from one column ('Relationship') exists in another ('CUST_NAME'), even partially. Then repeat the process for a second column…
NateO
  • 3
  • 3
0
votes
0 answers

Pandas and Fuzzy - comparing csv and mysql before overwriting

Based on the response presented in this topic [Writing to MySQL database with pandas using SQLAlchemy, to_sql, how would it be possible, with pandas and fuzzywuzzy, to compare a csv file with the data in the database (in two columns) and if it…
marcos
  • 1
  • 1
0
votes
1 answer

Python: if/else constructs inside functions

I have a function that calculate fuzzywuzzy score for two texts: def fuzzywuzzy(text_1, text_2): scores = { 'ratio' : fuzz.ratio(tn.normalize_title(text_1),tn.normalize_title(text_2)) / 100, 'partial_ratio' :…
SaNa
  • 333
  • 1
  • 3
  • 13
0
votes
0 answers

FuzzyWuzzy search using Asian characters

below code from good Samaritan - works great in English, can find strings of text in a large document and get confidence on how well it matches but cant figure out how to get it working with Thai characters #!/usr/bin/python from difflib…
TinkyWinkyMD
  • 35
  • 1
  • 7
0
votes
1 answer

python - fuzzywuzzy error - object of type float has no len

I am trying to use the fuzzywuzzy library to get similarity score between strings in 2 datasets using the fuzz.ratio function. Although I am constantly getting the following error : File "title_matching.py", line 29, in match =…
iammrmehul
  • 730
  • 1
  • 14
  • 35
0
votes
1 answer

appending of fuzzywuzzy process extract result into df

I have a list of company names which are not properly aligned. Data set looks like df[Name]= [Google, google, Google.inc, Google Inc., Google.com] I have about 500,000 rows and name should be corrected with best way possible. My code looks like…
Maneet Giri
  • 185
  • 3
  • 18
0
votes
1 answer

Fuzzy Matching Two Columns in the Same Dataframe Using Python

I have two datasets within the same data frame each showing a list of companies. One dataset is from 2017 and the other is from this year. I am trying to match the two company datasets to each other and figured fuzzy matching ( FuzzyWuzzy) was the…
Sam
  • 47
  • 1
  • 7
0
votes
0 answers

Print dataframe column

File1 only has 1 column: name/ File2 has 3 columns: name, num1, num2 from fuzzywuzzy import fuzz from fuzzywuzzy import process import pandas as pd data1 = pd.read_csv('file1.csv') data_list1 = data_to_match['Name1'] data2 =…
usertool
  • 15
  • 4
0
votes
1 answer

How to replace string if it matches partially (upto 90%) with the searched string in Python while working with Python-docx?

I want to replace text in my word document. I am able to replace text strings which are matching completely, but I want to replace it if it will match 90% with the searched string. I am using python-docx for working with Word documents. Below code…
Purva
  • 43
  • 1
  • 8
0
votes
0 answers

pandas - fuzzywuzzy - speeding loop up when doing fuzzymatching?

I am basically trying to join 2 dataframes using approximate match. How I do this in general is listed below: have the list of strings to matched define a function using fuzzy's process.extract apply this function across all rows in the 1st…
addicted
  • 2,901
  • 3
  • 28
  • 49
0
votes
0 answers

FuzzyWuzzy in python for matching thousands of rows

I have an excel worksheet with 4 columns as SPI1, D, SPI2, O and they are sorted by SellerPartyId ie SPI. I would like to see those in O which are not present in D. The number of rows on Originator Name is over 20,000 while those in DBAName is…
0
votes
0 answers

dealing with multiple simliar entities in panda dataframe

I have a dataframe with 'Name' column. There are multiple similar entryies with some inconsistencies. I want to merge them to one. I am a starter in data analysis and came to know about fuzzywuzzy module. I tried in below way names =…
S.Dasgupta
  • 61
  • 9
0
votes
1 answer

dictionary-based fuzzy matching

I want to match the entity occurrences in SeqString. For example: dict_data = ['johnson', 'apple platform'] SeqString = 'Johnson buys a new phone which is based on Apppple Platform. Johnson very likes the Apple Platform.' Expected results: Match…
futurelj
  • 273
  • 5
  • 14
0
votes
2 answers

fuzzywuzzy string match between 3 columns

So I have a column which looks like this. name col1 col2 col3 company1 Banking Finance B&F company2 Utilities Utilities NaN company3 Transportation Pipeline…
0
votes
0 answers

How to get a column of fuzzy scores (comparing one string to an column of strings) in a dataframe, python

I have a dataframe that lists objects and their different qualities in columns. One of those columns is the objects' colors. My goal is to write a function that creates a NEW column listing the fuzzy partial_ratio scores between the color of a…
Ella B.
  • 1
  • 2