Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
0
votes
1 answer

How to use efficient pattern matching to find row similarities in big data

I have a table of around 100,000 rows. This table is in an Excel file, and here is a snapshot of it: +------------+-----------+-----+-----+-----------------------------------------------------------+ | First Name | Last Name | Sex | Age | …
0
votes
2 answers

Fuzzy identity fingerprinting

I have a spreadsheet with values like address, name, IBAN, e-mail and want to identify when a customer last time bought something. The problem is: some fields contain spelling mistakes, others were deliberately entered wrong. On GitHub, several…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

Create a new table column with closest string match from another table

I have two lists of names for locations, with slightly different spelling, capitalisation, etc. I'm trying to match each site in the first list to the most similar one in the second list. SELECT name1, name2 FROM table1, table2 WHERE…
Jamie Bull
  • 12,889
  • 15
  • 77
  • 116
0
votes
2 answers

SAS Help: Using Index function to compare 2 columns

I want to compare string value of A and B by using the index function. I want to check if A contains B in its column. The only way I know how to do it is Index but the problem is index doesn't allow column name in its parameters. You have to enter…
Paula
  • 3
  • 3
0
votes
1 answer

Looping through 2 vectors of different dimension in R

I have two character vectors a, b with different dimensions. I have to take each element in a and compare with all elements in b and note the element if there is a close match. For matching I'm using agrepl function. Following is the sample data a…
Naveen
  • 53
  • 2
  • 8
0
votes
2 answers

fuzzy merge using SAS proc sql

I have two files which I would like to match by name and I would like to take account of spelling errors by using the compged function. The names have been thoroughly cleaned and I have no other useful match variables that could be used to reduce…
James
  • 101
  • 2
0
votes
1 answer

Match pandas dataframe name columns to another dataframe's columns?

I'm very new to Python. How can I match one text dataframe to another? (kindly please edit this question if I ask this wrongly) For example given this input data: df1 = id Names 0 123 Simpson J. 1 456 Snoop Dogg df2…
Ralph Deint
  • 380
  • 1
  • 4
  • 15
0
votes
2 answers

Fuzzy Matching Addresses

I am busy writing a simple algorithm to fuzzy match addresses from two datasets. I am calculating the levenshtein distance between two addresses and then adding the exact match or the shortest match to a matched array. However this is very slow as…
liamjnorman
  • 784
  • 1
  • 16
  • 30
0
votes
1 answer

Fuzzy string match + amount match in node.js

Hi i need to order the data according to the fuzzy matching of 2 variables Consider i have a string :"pet" and Amount 50 I have an object array as like below: [{"des":"permanent","amount":100}, {"des":"petrol","amount":1000}] I need an array as…
Subburaj
  • 5,114
  • 10
  • 44
  • 87
0
votes
0 answers

How to fuzzy match text in a column and then replace with a consensus in R

I have a dataframe as follows FName LName Ayeko Seki Ayeko Seki Ayeko Seki Ayeko Zeki Aveko Seki Avoo Zooki Jacques Bergmann. Jacques Burgman J Bergman Jacques Bergmann Jacques Bergmann Jacques Bergmann Jacques Bergmann David …
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
0
votes
0 answers

Efficient algorithm to index fuzzy hash signatures

I need to find if the new fuzzy hash signature I obtain is similar (to an stablished degree of certainty) to the others signatures I have obtained. Problem is, there's no way to sort fuzzy hash, so rather than use brute force and compare them all,…
nihil
  • 85
  • 1
  • 8
0
votes
1 answer

Why is the Levenshtein distance score so low for these two strings?

I am using a Levenshtein distance algorithm to find similar strings and I currently have my score for acceptance as 12 (because some of my strings have up to 5 words). But I was suprised to see the below two strings get a score of 11, they seem…
AbuMariam
  • 3,282
  • 13
  • 49
  • 82
0
votes
0 answers

Alter Matching code to run on different shaped data then originally written for

I do not recall where I found this code so I cannot give a proper attribution to its author but it fuzzy matches two columns of strings I want to alter the code so that the LookWith column is in coluwn B not column E I have tried for sometime to do…
xyz
  • 2,253
  • 10
  • 46
  • 68
0
votes
1 answer

Using stringsim in stringdist

I'm using the package stringdist to compare some vectors of strings but I keep getting a different answer than what I think I should when I try to test out the package. I want to do this: stringsim('PANDIAN', 'PANIAN', method="lv") [1]…
grad_student
  • 317
  • 1
  • 5
  • 13
0
votes
1 answer

In python: How to find match of string in same row, compare part of (fuzzy) matched string to list?

I have a matching problem that I've tried to solve, but have not found a way to do so. I'm new to python, so there might well be simple methods for doing this. I've searched the questions, but haven't found anything that quite gets what I need.…
Savage Henry
  • 1,990
  • 3
  • 21
  • 29