Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
2
votes
1 answer

fuzzy match between 2 columns (Python)

I have a pandas dataframe called "df_combo" which contains columns "worker_id", "url_entrance", "company_name". I am trying to produce an output column that would tell me if the URLs in "url_entrance" column contains any word in "company_name"…
Stanleyrr
  • 858
  • 3
  • 12
  • 31
2
votes
0 answers

FuzzyWuzzy using two pandas dataframes python

I want to find the fuzz.ratio of strings that are in two dataframes. Let's say I have 2 dataframes df with columns A, B and bt_df with columns A1, B1.. I want to compare the column df['B'] and bt_df['B1'] and return the best matching score and its…
User1090
  • 859
  • 6
  • 13
  • 19
2
votes
1 answer

Python FuzzyWuzzy unexpected mismatch between fuzz.ratio and process.extractOne results

I'm working on a code that uses fuzzy string matching to match a dataframe of user inputs (dataframe of lists of strings after some cleaning) to specific words of interest. I use Python Pandas for handling dataframes and the FuzzyWuzzy package for…
2
votes
1 answer

Using fuzzywuzzy to create a column of matched results in the data frame

I'm running into a challenge with using the FuzzyWuzzy library to store all my results in a data frame column (I'm guessing it might require a loop?) I've been scratching my head over this all day, now I want to see if any of you can help me with…
RedVII
  • 483
  • 6
  • 11
2
votes
1 answer

How do to fuzzy matching on excel file using Pandas?

I have a table called account with two columns - ID & NAME. ID is a hash which is unique but NAME is a string which might have duplicates. I'm trying to write a python script to read this excel file and match 0-3 similar NAME values, but I just…
g0d
  • 57
  • 1
  • 7
2
votes
3 answers

Fuzzy search Python

I have a big sample text, for example : "The arterial high blood pressure may engage the prognosis for survival of the patient as a result of complications. TENSTATEN enters within the framework of a preventive treatment(processing). …
2
votes
2 answers

Run a query against all values within nested lists of a multi-valued dictionary

I have a 'collections.defaultdict' (see x below) that is a multi-valued dictionary. All values associated with each unique key are stored in a list. >>>x defaultdict(, {'a': ['aa', 'ab', 'ac'], 'b': ['ba', 'bc'], 'c': ['ca',…
2
votes
0 answers

build a suggestions list using fuzzywuzzy

I was building a suggestion engine from my database. I'm using fuzzywuzzy module to match user input string with existing database as shown below. if request.method == "GET": display_dict = {} user_input = request.GET["user_input"] # get…
d-coder
  • 83
  • 5
1
vote
1 answer

Python module returning errors in bash but not from IDLE

I'm a newbie programmer posting here for the first time. Any suggestions or advice would be greatly appreciated! I am working on a project that compares the contents of, say test.csv to ref.csv (both single columns containing strings of 3-4 words)…
acpigeon
  • 1,699
  • 9
  • 20
  • 30
1
vote
1 answer

How to avoid cyclic matches in fuzzywuzzy

Here is my dataframe: df = pd.DataFrame( dict(Name=['Emma Howard', 'Emma Ward', 'Emma Warner', 'Emma Wayden'], Age=[33, 34, 43, 44], Score=[90, 95, 93, 92]) ) list2 = df['Name'].tolist() I am applying fuzzywuzzy…
Kingston X
  • 65
  • 5
1
vote
0 answers

How can I detect similarity of names in the same columns

Guys I have a dataset like this: ` df = pd.DataFrame(data = ['John','gal britt','mona','diana','molly','merry','mony','molla','johnathon','dina'],\ columns = ['Name']) df ` it gives this output Name 0 John 1 gal britt 2 …
1
vote
2 answers

Fuzzywuzzy to compare two lists of strings of unequal length and save multiple similarity metrics

I'm trying to compare two lists of strings and produce similarity metrics between both lists. The lists are of unequal length, one is roughly 50,000 the other is about 3,000. But here are two MWE data frames that are similar to my data: forbes =…
user2205916
  • 3,196
  • 11
  • 54
  • 82
1
vote
1 answer

How to apply fuzzy matching across a dataframe column with multiple lists and save results in a new column

I have a similar problem to the links provided in the following references with minor differences but want the same results: Apply fuzzy matching across a dataframe column and save results in a new column Fuzzy match strings in one column and…
1
vote
1 answer

FuzzyWuzzy Specific Column in DataFrame with Condition

I have dataframe contains a lot of typo name, it have shape like this Col A Col B Col C Col D A 1 Daniel Sunday A 1 Dan Sunday A 1 Danil Sunday A 2 Charles Sunday A 2 Charls Monday B 1 Andi Sunday B 1 Andy Sunday I want to…
Arthur
  • 17
  • 6
1
vote
2 answers

Find a string having highest partial match with other strings in a list

I have a list A with strings: ['assembly eye tow top', 'tow eye bolts', 'tow eye bolts need me'] I am trying to find a string strA that has the highest partial match score with all the strings in list A. In other words, create a string that contains…
Parsh
  • 347
  • 2
  • 8