Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
1
vote
1 answer

Create new column with fuzzy-score across two string columns in the same dataframe

I'm trying to calculate a fuzzy score (preferable partial_ratio score) across two columns in the same dataframe. | column1 | column2| | -------- | -------------- | | emmett holt| holt | greenwald| christopher It would need to look something like…
Antonius
  • 67
  • 9
1
vote
1 answer

Invalid operation: Failed to compile udf

I have imported the fuzzywuzzy library on Redshift from S3. I am trying to create the below function: CREATE OR REPLACE FUNCTION fuzzy_test (string_a TEXT,string_b TEXT) RETURNS FLOAT IMMUTABLE AS $$ FROM fuzzywuzzy import fuzz RETURN fuzz.ratio…
Geetha
  • 41
  • 4
1
vote
2 answers

How to reduce the processing time in a function that compares two sentences from two different dataframes?

I am using a function to compare the sentences of two dataframes and extract the value and sentence with the highest similarity: df1 : containing 40,000 sentences df2 : containing 400 sentences Each sentence of df1 is compared against the 400…
1
vote
1 answer

Pandas dataframe or SQLite fuzzy search

I'm scraping multiple sports betting websites in order to compare the odds for each match across the websites. My question is how to identify match_id from a match that already exists in the DB but has team names written in a different way. Please…
Drew
  • 113
  • 1
  • 14
1
vote
0 answers

fuzzy search in django postgresql without using Elasticsearch

I try to incorporate fuzzy serach function in a django project without using Elasticsearch. 1- I am using postgres, so I first tried levenshtein, but it did not work for my purpose. class Levenshtein(Func): template =…
ha-neul
  • 3,058
  • 9
  • 24
1
vote
1 answer

Find similarity between two dataframes, row by row

I have two dataframes, df1 and df2 with the same columns. I would like to find similarity between these two datasets. I have been following one of these two approaches. The first one was to append one of the two dataframes to the other one and…
user12907213
1
vote
0 answers

Please suggest improvements for fuzzy matching email header string values with Python

I'm currently trying to match 2 values that are found in the From header of an email. Specifically, the Sender Name and the Email_ID. To illustrate here is an example of this headers content: "Surname Lastname"…
1
vote
3 answers

Fuzzy matching not accurate enough with TF-IDF and cosine similarity

I want to find similarities in a long list of strings. That is for every one string in the list, I need all similar strings in the same list. Earlier I used Fuzzywuzzy which provided good accuracy with the results I wanted by using the…
dummydoc
  • 11
  • 1
  • 2
1
vote
0 answers

Reduce execution time of fuzzypattern matching

Instead of passing individual parameters iteratively, I am passing the whole column but still it is taking the same amount of time. It is taking approx 1 Minute which is very long..... Here is the code from fuzzywuzzy import fuzz import json import…
Hamza Shaikh
  • 75
  • 1
  • 8
1
vote
0 answers

Specific Approximate Matching in Python

PROBLEM I want to implement a type of specific approximate matching of two sentences in Python. Example - s_1 = "I hope you are safe from COVID-19 today" s_2 = "I hope you're safe from COVID 19 today" score = get_similarity(s_1, s_2) OR s_1 = "I…
Adhish Thite
  • 463
  • 2
  • 5
  • 20
1
vote
1 answer

Why is this fuzz.ratio giving me 25 when none of the characters match?

I'm trying to work through how fuzzywuzzy calculates this simple fuzz ratio: print(fuzz.ratio("66155347", "12026599")) 25 Why is the fuzz ratio not 0 since they are completely different characters in every position? The Levenshtein Distance = 8…
1
vote
2 answers

How to get matched text from a given list which is given to fuzzy wuzzy partial_ratio()?

I have a string and a list of strings. I just want to know which text in the list is 100% partially matched with the given string. from fuzzywuzzy import fuzz s1 = "Hello" s_list= ["Hai all", "Hello world", "Thank you"] fuzz.partial_ratio(s1,…
1
vote
0 answers

Group Similar usernames together

My data is as follows. As you can see, the first entry is 'tim' which matches with tim.rand and timrook. Similarly, pankit090 matches with pankit001, pankit002, pankit003, pankit004, pankit005 I want the result to be like below What I was able to…
Gupta
  • 314
  • 4
  • 17
1
vote
1 answer

Multiprocessing the Fuzzy match in pandas

I have two data frames. DF_Address, which is having 347k distinct addresses and DF_Project which is having 24k records having Project_Id, Project_Start_Date and Project_Address I want to check if there is a fuzzy match of my Project_Address in…
1
vote
1 answer

Fuzzywuzzy for a list of dictionaries

I have a list of dictionaries (API response) and I use the following function to search for certain nations: def nation_search(self): result = next((item for item in nations_v2 if (item["nation"]).lower() == (f"{self}").lower()), False) if…
Sam Cooper
  • 15
  • 3