Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
4
votes
4 answers

FuzzyWuzzy String Matching - Case Sensitivity

I'm using the FuzzyWuzzy String Matching module from SeatGeek. I find that when using the token_set_ratio search algorithm, small differences in case gives wildly differing results. For example, if I am looking for the phrase "I am eating" in a…
shoi
  • 167
  • 1
  • 3
  • 7
3
votes
1 answer

Create a Fuzzy Duplicate Key to Sum Rows with Fuzzy Matches (Pandas)

So I have a table where I have identified fuzzy matches and an amount. I want to be able to summarize the amount by this common key. My Data looks like…
David 54321
  • 568
  • 1
  • 9
  • 23
3
votes
2 answers

Compare items from lists and find similarity

I would like to compare items from two lists (please see below). I am looking for similarity about the items. For example, I have this item from b_list: http://www.ilcorrieredellanotte.it which is similar to Corriere della Sera from g_list. An…
user12092724
3
votes
1 answer

Python multiprocessing against lists for fuzzywuzzy

I have two lists to match against one another. I Need to match each str1 word with each list of str2 words. I have a list of 40k words in str2. I want to try using multiprocessing to make it run faster. For example: str1 = ['how', 'are',…
code_learner
  • 233
  • 1
  • 9
3
votes
1 answer

Fuzzy finding poker flop in Python

Given a list of poker flops and a str as a target: target = '5c6d2d' flops = ['5s4d3s', '6s4d2d', '6s5d3s', '6s4s2d'] I am trying to find the closest match to the target. Currently using fuzzywuzzy.process.extract, but sometimes this doesn't return…
chunpoon
  • 950
  • 10
  • 17
3
votes
2 answers

Python fuzzy match grouped by category

I am trying to clean data using fuzzy match. The df like: category description 1 almnd 1 almond 2 choc 2 choco I want to have all similar descriptions to be same one under same category like that: category description 1…
3
votes
1 answer

TypeError: expected string or bytes-like object'

I am running this code in python with FuzzyWuzzy which returns me this error: TypeError: ('expected string or bytes-like object', 'occurred at index CONCAT') Is there a fast easy way to avoid that error ? My file contains some Int like 142…
Simon GIS
  • 1,045
  • 2
  • 18
  • 37
3
votes
1 answer

Quicker way to perform fuzzy string match in pandas

Is there any way to speed up the fuzzy string match using fuzzywuzzy in pandas. I have a dataframe as extra_names which has names that I want to run fuzzy matches for with another dataframe as names_df. >> extra_names.head() not_matching 0 Vij…
Aman Singh
  • 1,111
  • 3
  • 17
  • 31
3
votes
1 answer

Fuzzywuzzy match multiple columns from different dataframes in Python

Let's say I have the following 3 dataframes: import numpy as np from fuzzywuzzy import fuzz from fuzzywuzzy import process import pandas as pd import io import csv import itertools import xlsxwriter df1 = pd.DataFrame(np.array([ [1010667747,…
ah bon
  • 9,293
  • 12
  • 65
  • 148
3
votes
2 answers

Group strings with values in Python

I'm working on twitter hashtags and I've already counted the number of times they appear in my csv file. My csv file look like: GilletsJaunes, 100 Macron, 50 gilletsjaune, 20 tax, 10 Now, I would like to group together 2 terms that are close, such…
Steph
  • 39
  • 3
3
votes
1 answer

Fuzzy match strings in one column and create new dataframe using fuzzywuzzy

I have the following dataframe: df = pd.DataFrame( {'id': [1, 2, 3, 4, 5, 6], 'fruits': ['apple', 'apples', 'orange', 'apple tree', 'oranges', 'mango'] }) id fruits 0 1 apple 1 2 apples 2 3 orange 3 4 …
ah bon
  • 9,293
  • 12
  • 65
  • 148
3
votes
2 answers

Fuzzy match rows in single dataframe to find duplicates in pandas and python

I stumbled across this post that I have been referencing: Apply fuzzy matching across a dataframe column and save results in a new column . The code I am referencing is in the answer section and uses fuzzy wuzzy and pandas. It uses fuzzy wuzzy to…
3
votes
2 answers

Finding the similar names in single large df in python using fuzzywuzzy

I am trying to figure out the best way possible to align my dataset which contains "Company Names". My dataset is about 300k rows and 3 columns. I tried many methods so far including Fuzzywuzzy using choices = ["Atlanta Falcons", "New York Jets",…
Maneet Giri
  • 185
  • 3
  • 18
3
votes
1 answer

Fuzzywuzzy scores for sentences w/no overlapping words are higher than those with some overlap?

I am using fuzzywuzzy to calculate the similarity between two sentences. Here are some results that make no sense to me: from fuzzywuzzy import fuzz s1 = "moist tender pork loin chop" s2 = "corn bicolor" fuzz.token_sort_ratio(s1,s2) This gives me…
user3490622
  • 939
  • 2
  • 11
  • 30
3
votes
1 answer

How to use fuzzywuzzy's process.extract function to process a list of objects by a particular attribute?

I have created an object called Issuer, which contains a member named issuer_name. I want to take advantage of fuzzywuzzy's process.extract() function, but it only takes in a list of strings. My goal is to find matches and return the list of objects…
firstblud
  • 135
  • 3
  • 11