Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
5
votes
1 answer

Using Python's jellyfish module to get best match (partial string matching)

I am trying to create a dictionary of some kind to append my results and get the best match using the jaro distance function. This is part of my attempt to match 2 lists and get the best matched name in both. Example: import…
BernardL
  • 5,162
  • 7
  • 28
  • 47
5
votes
1 answer

Dask: very low CPU usage and multiple threads? is this expected?

I am using dask as in how to parallelize many (fuzzy) string comparisons using apply in Pandas? Basically I do some computations (without writing anything to disk) that invoke Pandas and Fuzzywuzzy (that may not be releasing the GIL apparently, if…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
5
votes
1 answer

create new column in dataframe using fuzzywuzzy

I have a dataframe in pandas where I am using fuzzywuzzy package in python to match first column in the dataframe with second column. I have defined a function to create an output with first column, second column and partial ratio score. But it is…
Abacus
  • 197
  • 1
  • 2
  • 6
4
votes
1 answer

Fuzzy Lookup In Python

I have two CSV files. One that contains Vendor data and one that contains Employee data. Similar to what "Fuzzy Lookup" in excel does, I'm looking to do two types of matches and output all columns from both csv files, including a new column as the…
Mystical Me
  • 137
  • 6
4
votes
2 answers

Is there a way to boost matching performance when doing string matching in Python?

I have a very large dictionary which stores large numbers of English sentences and their Spanish translations. When given a random English sentence, I intend to use Python's fuzzywuzzy library to find its closest match in the dictionary. My…
wbzy00
  • 146
  • 9
4
votes
3 answers

How to separate with commas an unpacked list of tuples

Unpacking the resulting list of tuples into a comma-separated values. Using FuzzyWuzzy, I am comparing 2 files and want to output the results into a 3rd file. Building out from this SO question: Python: Keepning only the outerloop max result…
4
votes
4 answers

AttributeError: module 'fuzzywuzzy' has no attribute 'ratio'

I am trying to call ratio() function from the library fuzzywuzzy to match two string and get the following error message: AttributeError: module 'fuzzywuzzy' has no attribute 'ratio' Has the version changed? I tried to look for other functions…
sharp
  • 2,140
  • 9
  • 43
  • 80
4
votes
1 answer

Include a score cutoff into my Fuzzywuzzy string matching project to only include matches higher than score x

I have been recycling a bunch of code from all over the place to create a string matcher for two csv files I have. The output of my Code right now is the 3 highest matches per string. I want to additionally include cutoff below a certain match…
Tim
  • 161
  • 7
  • 24
4
votes
2 answers

Multiprocessing fuzzy wuzzy string search - python

I am trying to do string match and bring the match id using fuzzy wuzzy in python. My dataset is huge, dataset1 = 1.8 million records, dataset2 = 1.6 million records. What I tried so far, First I tried to use record linkage package in python,…
ds_user
  • 2,139
  • 4
  • 36
  • 71
4
votes
4 answers

All-to-All comparison of two lists in Python

I'm struggling with some performance complications. The task in hand is to extract the similarity value between two strings. For this I am using fuzzywuzzy: from fuzzywuzzy import fuzz print fuzz.ratio("string one", "string two") print…
VnC
  • 1,936
  • 16
  • 26
4
votes
1 answer

Matching 2 large csv files by Fuzzy string matching in Python

I am trying to approximately match 600,000 individuals names (Full name) to another database that has over 87 millions observations (Full name) ! My first attempt with fuzzywuzzy library was way too slow, so I decided to use the module fuzzyset…
Adrien
  • 461
  • 5
  • 19
4
votes
1 answer

Get index of python fuzzywuzzy match

I'm using Python fuzzywuzzy to find matches in a list of sentences: def getMatches(needle): return process.extract(needle, bookSentences, scorer=fuzz.token_sort_ratio, limit=3) I'm trying to print out the match plus the sentences around…
Nathan Arthur
  • 8,287
  • 7
  • 55
  • 80
4
votes
1 answer

Odd behavior of to_dict

I'm building a fuzzy search program, using FuzzyWuzzy, to find matching names in a dataset. My data is in a DataFrame of about 10378 rows and len(df['Full name']) is 10378, as expected. But len(choices) is only 1695. I'm running Python 2.7.10 and…
nocoolsoft
  • 87
  • 1
  • 5
4
votes
1 answer

Fuzzy logic on big datasets using Python

My team has been stuck with running a fuzzy logic algorithm on a two large datasets. The first (subset) is about 180K rows contains names, addresses, and emails for the people that we need to match in the second (superset). The superset contains…
4
votes
4 answers

Python Comparing two lists of strings for similarities

I'm very new at Python but I thought it would be fun to make a program to sort all my downloads, but I'm having a little trouble with it. It works perfectly if my destination only has one word in it but if the destination has two words or more this…
1 2
3
34 35