Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
0
votes
1 answer

Perform Named Entity Recognition - NLP

I am trying to learn how to perform Named Entity Recognition. I have a set of discharge summaries containing medical information about patients. I converted my unstructured data into structured data. Now, I have a DataFrame that looks like…
Alan
  • 129
  • 1
  • 13
0
votes
1 answer

having problemns while using dask map_partitions with string matching algorithm

I'm having some probems apllying a text search algorithm with parallelized dask insfrastructure. I'm tryng to find the best match for 40,000 stirngs in a series object against a 4000 string list. I could have done it using pandas.apply but it's to…
0
votes
1 answer

Customizing fuzzywuzzy string matching to edit distance <= 1

I am new in algorithms and my question may be silly, but how can I specify the edit distance in fuzzywuzzy library? I want edit distance <= 1 between two strings. from fuzzywuzzy import fuzz fuzz.ratio('Apple', 'Aple') I tried to look at source…
uchiha itachi
  • 195
  • 11
0
votes
1 answer

How to apply a complex lambda function in Pandas DataFrame with long list of elements per row

I have a pandas DataFrame in which I have a long string per every row in one column (see variable 'dframe'). In separate list I stored all keywords, which I have to compare with every word from each string from DataFrame. If keyword is found, I have…
Typek
  • 3
  • 2
0
votes
1 answer

How to use fuzzyWuzzy with two csv's?

I am trying to compare two csv's that contain job titles. One csv contains job titles from the U.S. Bureau of Labor Statistics and the other contains a manually generated list of job titles. There are roughly 2000 job titles in each list. I am very…
Alex
  • 1
0
votes
1 answer

How to fix incorrect fuzzy-matches with over 90 thresholds?

I have two datasets that I need to fuzzy-match over a column which contains organization names. I used fuzzywuzzy library in Python and set the threshold 50 (see the code below). The code successfully matched some names. When I eyeballed the…
0
votes
1 answer

clustering set of string sentences into unknown number of groups

I have a set of sentences (each sentence = x number of rows where x belongs to range (1,6)). I want to group these sentences based on the similarities between them. I have tried fuzzy wuzzy.token_set_ration but the trouble I have is that I need to…
0
votes
1 answer

Fuzzy matching inside a column

Suppose I have a list of sports like this : sports=["futball","fitbal","football","tennis","tenis","tenisse","footbal","zennis","ping-pong"] I would like to create a dataframe that match each element of sport with it's closest if the fuzzy matching…
Arli94
  • 680
  • 2
  • 8
  • 19
0
votes
1 answer

grouping labels and exporting data from Python back to SQL Server

I am trying to clean a column using fuzzywuzzy using the following code: import pyodbc from fuzzywuzzy import fuzz # Getting sql list conn = pyodbc.connect('Driver={SQL Server};' 'Server=USER-PC\SQLEXPRESS;' …
0
votes
1 answer

FuzzyWuzzy throws TypeError only in flask app

So I am building a small flask-based search tool deployed on heroku to check what rankings universities can be found in. For this I am using fuzzywuzzy to go through lists of lists and returning the relevant rank. @app.route('/results',…
Uralan
  • 79
  • 1
  • 9
0
votes
1 answer

More efficient string comparison in list

I'm writing a program to detect court cases cited in a large number of texts from different sources, and count how many times each is cited across texts. The problem stems from the fact that cases exist in two states within most documents: They are…
0
votes
0 answers

Multi thread in Python 3.x error question

I'm trying to setup a multithread code in Python 3.x. I have created two functions to fuzzy match some data faster than usual. I'm trying to split the client data in two parts and run it separately. When I try to start the thread I receive an error…
Erick Batista
  • 21
  • 1
  • 3
0
votes
1 answer

Arrange words in array in php

Rearrange words in Array based on position of the first array. In my code there are two array my first array is the base array from which i am going to compare it with second array and make the position same as first array. Consider 2 input By…
user10655999
0
votes
1 answer

TypeError: NoneType is unsubscriptable - IF statement

I am trying to find fuzzy string matches for university names and print a certain score (10, 5 ,3) to a csv each time depending on what list the closest match came from. data = [["MIT"], ["Stanford"], ...] Data1 = ['MASSACHUSETTS INSTITUTE OF…
Uralan
  • 79
  • 1
  • 9
0
votes
1 answer

Working with a large data set removing unwanted variants from the product titles

I keep having an issue with my code i'm not sure what else I could do. I want to remove all variants from the product titles. some of them are being removed and some are not. Examples of what is not is being removed is oz,ml, mg and alot of words…