Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
0
votes
1 answer

Fuzzy Compare between two hive columns using apache spark with scala

I am reading the data from 2 hive tables. Token table has the tokens that needs to be matched with the input data. Input data will have description column along with other columns. I need to split input data and need to compare each splitted element…
0
votes
1 answer

How do I add a Python module from inside conda's site-package directory to spark-submit?

I need to run a PySpark application (v1.6.3). There is the --py-files flag to add .zip, .egg, or .py files. If I had a Python package/module at /usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy, how would I include this whole module? Inside…
Jane Wayne
  • 8,205
  • 17
  • 75
  • 120
0
votes
1 answer

Return the index of a list of a fuzzywuzzy match

I have a list of 'ids': ids = [None, '20160928a', '20160929a', ... ] and another list of certain 'ids' that I found were duplicate ids using fuzzywuzzy: repeat_offenders = ['20160928a', '20161115a', '20161121a', ... ] I would like to use…
Graham Streich
  • 874
  • 3
  • 15
  • 31
0
votes
0 answers

How can I get SSIS fuzzy lookup to ignore token order like python's token_sort_ratio does

My source data has the same data as the reference record, but in a different order. eg: 0.42345795,test address client #12 order; token@,token@ client #12 order; address, For the same inout and lookup records, SSIS gave a similarity of 0.4 and…
Mythri
  • 1
  • 1
0
votes
4 answers

Extracting numbers from a string using regex in python

I have a list of urls that I would like to…
Graham Streich
  • 874
  • 3
  • 15
  • 31
0
votes
0 answers

Finding duplicate values based on condition

Below is the sample data: 1 ,ASIF JAVED IQBAL JAVED,JAVED IQBAL SO INAYATHULLAH,20170103 2 ,SYED MUSTZAR ALI MUHAMMAD ILYAS SHAH,MUHAMMAD SAFEER SO SAGHEER KHAN,20170127 3 ,AHSUN SABIR SABIR ALI,MISBAH NAVEED DO NAVEED ANJUM,20170116 4 ,RASHAD IQBAL…
0
votes
1 answer

How to use FuzzyWuzzy in Python to name match between two data frames?

I have df1 and df2. I want to use fuzzywuzzy to string match column A in df1 to column A in df2, and return an ID in column B of df2 based on a certain ratio match. For example: df1 looks like this: Name Sally sells Seashells df2 looks like…
Window
  • 87
  • 1
  • 8
0
votes
0 answers

Approximate name matching to merge two dataframes python

I am working with two dataframes (df1 and df2) of which I would like to merge df2 into df1 based on name matching, but between the two the names are not exactly matching (for example: 'JS Smith' may be "J.S. Smith (Jr)") and the names in df1 are in…
wingsoficarus116
  • 429
  • 5
  • 17
0
votes
2 answers

Python - String Matching using Fuzzy Wuzzy (extracting single letters as opposed to words)

OBJECTIVE Take Company B's Accounting Description (e.g "Cash") and match them to Company A's accounting description (e.g "Cash Rollup"). APPROACH Record Company A and Company B's Accounting Descriptions, place each into their own dataframes…
jonplaca
  • 797
  • 4
  • 16
  • 34
0
votes
1 answer

Slow fuzzy matching between two DataFrames

I have DataFrame A (df_cam) with cli id and origin: cli id | origin ------------------------------------ 123 | 1234 M-MKT XYZklm 05/2016 And DataFrame B (df_dict) with shortcut and campaign shortcut | …
HonzaB
  • 7,065
  • 6
  • 31
  • 42
0
votes
1 answer

cx_Freeze giving error when using fuzzywuzzy

I have built a tkinter GUI for survey entry in python3.4 that uses a number of packages. I then need to compile it to an executable so that I can put it on a coworkers machine (we both are on windows7 platform) I've structured my setup.py to look…
Djones4822
  • 577
  • 3
  • 6
  • 23
0
votes
2 answers

fuzzywuzzy ratio of 2 columns if one column satisfies 100 percent match the best one

My data frame is Matcher = df2['Account Name'] match = if df1['Billing Country'] == df2['Billing Country'] (process.extractOne(df1['Account Name'], Matcher)) The above code is not working but I want to do the fuzzy match of account name only…
Maneet Giri
  • 185
  • 3
  • 18
0
votes
1 answer

TypeError when using FuzzyWuzzy and Pandas for string matching

I'm getting an error while using the FuzzyWuzzy library in Python 3. I'm working with CSV files also using the Pandas library. I have the following data in my CSV file: > BBL CorporationName CorporationName2 1 …
Steven
  • 824
  • 1
  • 8
  • 23
0
votes
1 answer

Identifying similar strings in a database in Python

I have a database table containing well over a million strings. Each string is a term that can vary in length from two words to five or six. ["big giant cars", "zebra videos", "hotels in rio de janeiro".......] I also have a blacklist of over…
GreenGodot
  • 6,030
  • 10
  • 37
  • 66
0
votes
0 answers

How can I improve performance on my apply() with fuzzy matching statement

I've written a function called muzz that leverages the fuzzywuzzy module to 'merge' two pandas dataframes. Works great, but the performance is pretty bad on larger frames. Please take a look at my apply() that does the extracting/scoring and let…
Bob Haffner
  • 8,235
  • 1
  • 36
  • 43