Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
1
vote
1 answer

python fuzzywuzzy ratio different output on server as on local machine

I'm using fuzzywuzzy to calculate the similarity between two strings. for example: from fuzzywuzzy import fuzz fuzz.ratio('12aefadfaeaffdafafa3','124afdefadaefadfaad') >>> 56 This is embedded into some code that I can run on my webserver (amazon…
Rob Teeuwen
  • 455
  • 5
  • 21
1
vote
1 answer

TypeError: ('expected string or bytes-like object', 'occurred at index 0') when calling process.extract

I get the following error message when I try to use process.extract from the fuzzywuzzy library on a column in a pandas DataFrame: TypeError: ('expected string or bytes-like object', 'occurred at index 0') Background I have the following sample…
SFC
  • 733
  • 2
  • 11
  • 22
1
vote
2 answers

I have installed fuzzywuzzy module but when i import it in jupyter notebook it give error no module found

I have install fuzzywuzzy module and i can import in python shell when i import it in a jupyter notebook it gives error no module found. >>> from fuzzywuzzy import fuzz >>>''' ```import pandas as pd import json from fuzzywuzzy import…
Nep_tune
  • 11
  • 1
  • 5
1
vote
0 answers

To remove an unmatched string from a imported CSV file using pandas and fuzzywuzzy

Hi I am working on removing an unmatched string from an imported CSV file using pandas and fuzzywuzzy The inputs taken from the CSV File are as follows ID,Title 1,The One and Only Ivan (Korean Edition) 1,The One and Only Ivan CD 1,The One and Only…
1
vote
1 answer

'utf-8' codec can't decode byte 0xb7

I am using this by python3 matchtagger.py bulkmatch . where I can match specific words and capture the sentence and save the output in csv , all notes in same folder of the code, import re import click import time import os import csv import…
Mas Maz
  • 27
  • 9
1
vote
0 answers

Matching strings in a pandas dataframe using fuzzywuzzy

I have two dataframes: Instructor_Info and Operator_Info Instructor_Info contains a column called Names and OperatorName, and Operator_Info also has a column called Names. All names in Instructor_Info have an associated name in Operator Info. I want…
DataScience99
  • 339
  • 3
  • 10
1
vote
2 answers

Looking for a quicker way of fuzzy string matching

I am using fuzzywuzzy in python for fuzzy string matching. I have a set of names in a list named HKCP_list which I am matching against a pandas column iteratively to get the best possible match. Given below is the code for it import fuzzywuzzy from…
Nirvik Banerjee
  • 335
  • 5
  • 16
1
vote
1 answer

Partial String Matching within Groups

I have data that includes a group (Area) and then also provides a name. I am trying to merge two data frames. One frame is much smaller and is the "mapping" data frame. It has one row for each Name within an Area. The other frame is much larger and…
Kskiaskd
  • 35
  • 5
1
vote
0 answers

Is there anyway to check if a string "almost" contains another string?

I'm working on a project that requires me to check if string1 is almost present in string2, if yes (i.e. if it matches more than some threshold ration say delta), then I need to extract that matched segment from string2 and save it. string1 will…
droidmainiac
  • 198
  • 1
  • 9
1
vote
2 answers

Error with FuzzyWuzzy: StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)

I cannot get the following function to run: match, match_score = process.extractOne(score, pct_dict.keys()) I get a whitespace error I cannot seem to resolve. Any idea what is causing this? What it should do: If the score is 15 it should return…
Brook Hurd
  • 37
  • 5
1
vote
1 answer

Python: Return Pandas DataFrame from FuzzyWuzzy ExtractOne()

I have two Pandas DataFrames (person names), one small (200+ rows) and another one pretty big (100k+ rows). They both have similar header but the big one has an unique ID too, as following: Small: LST_NM, FRST_NM, CITY Big: LST_NM, FRST_NM, CITY,…
c_c
  • 35
  • 5
1
vote
0 answers

Replacing strings using fuzzywuzzyR

I have a large data set with city names. Many of the names are not consistent. Example: vec = c("New York", "New York City", "new York CIty", "NY", "Berlin", "BERLIn", "BERLIN", "London", "LONDEN", "Lond", "LONDON") I want to use fuzzywuzzyR to…
Banjo
  • 1,191
  • 1
  • 11
  • 28
1
vote
1 answer

Filtering a dataframe using Fuzzywuzzy keyword matches

Novice Python user here. I have a dataframe imported from a csv file which I need to search for "Alert" and "Amber" keywords from the from_data column (searching for upper, lower or a combination of both case). Here are the contents of my dataframe…
Big_Daz
  • 141
  • 1
  • 7
1
vote
1 answer

Fuzzywuzzy on subset of data based on conditions

Firstly, note I'm a python newbie, so any apologies in advance. I have however researched this for the last day or 2 with no luck - hence my first post here. I need to fuzzy match data based on 'Name' in a CSV file in the following…
Jack
  • 13
  • 3
1
vote
0 answers

Returning Multiple Columns from FuzzyWuzzy token_set_ratio

I am attempting to perform some fuzzy matching across two datasets containing lots of addresses. I am iterating through a list of addresses in df, and finding the 'most matching' out of another: for index,row in df.iterrows(): test_address =…