Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
0
votes
0 answers

Fuzzy in a dictionary

Im Ann and working for the furst time with Fuzzy. I want to match my content in a dictionray. So its fuzzy the right methode to go this result? Or is there any other possibility i should read first? thanks for reading and a blessed day! The…
0
votes
2 answers

Match slightly different records in a field

I have the below table HAVE. How can I go about getting results in "WANT" ? I'll appreciate ideas and I'm open to any fuzzy match algorithm out there Have ID Name 1 Davi 2 David 3 DAVID 4 Micheal 5 Michael 6 Oracle 7 Tepper WANT ID…
user2008558
  • 341
  • 5
  • 16
0
votes
0 answers

Match two columns in different dataframes and show match score in python - fast

I have two dataframes df1 and df2. df1 = pd.DataFramE({'Name': ['Zebra system','Lion healthcare'], 'Type': ['S','A']}) df2 = pd.DataFrame({'AltName': ['Zebra system llc','abra inc. 54','Lions corp health care','Zebra sys co','lions system atl'],…
Zain
  • 1
  • 1
0
votes
1 answer

Sum all counts when their fuzz.WRatio > 90 otherwise leave intact

What I want to do was actually group by all similar strings in one columns and sum their corresponding counts if there are similarity, otherwise, leave them. A little similar to this post. Unfortunately I have not been able to apply this to my…
Chen
  • 383
  • 2
  • 12
0
votes
0 answers

Approximate Comparison of one huge nested list with elements of another huge nested list

I have two nested lists one with around 10K x 6 elements.Other nested list has 28K * 15 elements. This is the pseudo logic I'm implementing using nested loops for doing approximate comparison if nested_list_1[iter_1][0] and nested_list_1[iter_2][3]…
Saad Saadi
  • 1,031
  • 10
  • 26
0
votes
1 answer

Fuzzy matching on keyword in a larger string - SAS

Using SAS, I have a table with sentences and I am looking to find the rows in the table where the keyword is found in the sentence making use of fuzzy matching (complev function). Is there a way in SAS to find the keyword string in the sentences? I…
0
votes
0 answers

Fastest way to fuzzy match two csv files

I have written a very simple program using a nuget package in c# to read in 2 csv files and fuzzy match them and output a new csv file with all the matches. The problem is i need the program to be able to read and compare files up to 700k and…
jbigs89
  • 11
  • 3
0
votes
1 answer

Analyzing a list of terms and their nearest neighbors in Postgres 11.5

I've got a database where we regularly need to do fuzzy/distance matching on strings. In this example, the target citext field is named analytic_scan.inv_name. But the same sort of code could be useful for any number of other text and citext fields.…
Morris de Oryx
  • 1,857
  • 10
  • 28
0
votes
1 answer

Get the list of matching token from Fuzzywuzzy

I am using fuzzywuzzy token_set_ratio to match 2 strong. I want to know the tokens that were matching. Is there a function in fuzzywuzzy to do so? String1="this is a banana tree" String2="there is banana tree next to my house" the token_set_ratio in…
Sid
  • 552
  • 6
  • 21
0
votes
1 answer

Merge two dataframes based on fuzzy-matches in two columns

I have 2 dataframes that I am trying to merge based on IDs and a secondary ID. Here are a sample of the two dataframes: First ID Second ID Company 10056526008010 0.000000e+00 Company…
temsandroses
  • 311
  • 1
  • 3
  • 11
0
votes
1 answer

Pandas replace strings with fuzzy match in the same column

I have a column in a dataframe that is like this: OWNER -------------- OTTO J MAYER OTTO MAYER DANIEL J ROSEN DANIEL ROSSY LISA CULLI LISA CULLY LISA CULLY CITY OF BELMONT CITY OF BELMONT CITY Some of the names in my data frame are…
0
votes
2 answers

String matching and store results as lists in cells

I have two very large tables df1 and df2 (multiple millions of rows each) of person-related data and each table has a column that contains the name of a person (column name: "Name"). The names of one and the same person can be written differently…
constiii
  • 638
  • 3
  • 19
0
votes
1 answer

Fuzzy scoring top N in Python 3?

I am trying to build a dataframe of word and fuzzywuzzy score, and take top 5. For example I have test word test = "kuku" My bag of words are: words = ["tutu", "pupu", "lulu", "kuko", "dfvfd", "wwwer"] I have done the following so far: import…
SteveS
  • 3,789
  • 5
  • 30
  • 64
0
votes
1 answer

How to run a function on each subset of a dataframe based on multiple conditions

The data I have a dataframe in R with the following sort of structure: ID Type Group Text 100 A 1 Lorem ipsum dolor sit amet 103 A 1 Lorem ipsum dolor sit amet 105 A 1 consectetur adipiscing eli 106 A …
Thredolsen
  • 247
  • 1
  • 11
0
votes
1 answer

Approximate de-duplication

Suppose I have a dataset like this: that I need to examine for possible duplicates. Here, the 2nd and 3rd rows are suspected duplicates. I'm aware of string distance methods as well as approximate matches for numeric variables. But have the two…
Thomas Speidel
  • 1,369
  • 1
  • 14
  • 26