Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
0
votes
1 answer

How to effeciently find all fuzzy matches between a set of terms and a list of sentences?

I have a list of sentences (e.g. "This is an example sentence") and a glossary of terms (e.g. "sentence", "example sentence") and need to find all the terms that match the sentence with a cutoff on some Levenshtein ratio. How can I do it fast…
x3al
  • 586
  • 1
  • 8
  • 24
0
votes
3 answers

Organization of fuzzy matches

I've gone through and fuzzy matched each element in a list of 20,000+ movie titles with each other element, which returns a value for each pair: from fuzzywuzzy import fuzz titles = ['Scary Movie', 'Happy Movie', 'Sappy Movie', 'Crappy…
Benjamin James
  • 941
  • 1
  • 9
  • 24
0
votes
0 answers

regex to exclude 2 consecutive variations

I am trying to do some fuzzy matching (in R) and want to make some rules about how many consecutive variations are allowed. For example, if I use the Levenshtein Distance and the distance is greater than 2, I want to exclude any matches where these…
statsNoob
  • 1,325
  • 5
  • 18
  • 36
0
votes
0 answers

Check for text matching with leniency?

I'm working on a application that reads a trivia question to the user and then asks the user to type an answer to see if they are correct. I want to be able to compare the user answer to the actual answer to see if there if they match to a certain…
user2483916
  • 59
  • 4
  • 9
0
votes
1 answer

Matching 2 short descriptions and returning a confidence level

I have some data that I get from the Banks using Yodlee and the corresponding transaction messages on the mobile. Both have some description in them - short descriptions. For example - string1 = "tatasky_TPSL MUMBA IND" string2 =…
Ninjinx
  • 625
  • 2
  • 7
  • 13
0
votes
0 answers

R : Fuzzy name match for variable size

I have been working on matching the source set with master set of customer names while this can be achieved by using -adist in R but now I have been using 2 million of source set with 500k of master set, here we cant use the adist as it does not…
KRU
  • 291
  • 4
  • 18
0
votes
3 answers

Fuzzy-match List of People

I am trying to see if a movie is the same between two pages, and to do so I would like to compare the Actors as one of the criteria. However, actors are often listed differently on different pages. For example: On this page,…
David542
  • 104,438
  • 178
  • 489
  • 842
0
votes
1 answer

R: Want to do a dictionary check and remove unwanted space in between where removing space will make it a proper word

I am using R for text mining and have data that have been concatenated from different text columns. There are cases where words have been split by a space like"functi oning". I want to detect all such cases and remove space in between by doing…
0
votes
1 answer

Euclidean distance when similar features are slightly shifted

Let us say I want to find a similar vector for a vector a = [0 0 2 0 0 0 0 0 0] I have two candidates: b1 = [0 0 0 2 0 0 0 0 0], where the "feature" is just 1 position away b2 = [0 0 0 0 0 0 0 2 0], where the "feature" is 5 positions…
iloo
  • 926
  • 12
  • 26
0
votes
1 answer

Successively agrep names in a variable, then create a new variable with the shortest name for close matches

Assume a character vector of company names where the names come in various forms. Here is a small version of 10,000 row data frame; it shows the desired second vector ("two.names"). structure(list(firm = structure(1:8, .Label = c("Carlson Caspers",…
lawyeR
  • 7,488
  • 5
  • 33
  • 63
0
votes
1 answer

Fuzzy logic - Computing membership function given term set

I am a student studying for a Fuzzy Logic exam, and I have been working my way through the questions about fuzzy sets. However I have just came across an exam question that I do not understand how to do from the lecturer's notes, and was wondering…
kevinh
  • 61
  • 7
0
votes
1 answer

Fuzzy lookup in SSIS does not output the result expected

To make things simple lets say I have a Client table with Fields: ClientID PCode Region I have a look up Region table with Fields: ID PostCode Region The Client table has one row : 123, 3075, THOMASTOWN The Region has 2 rows:…
Jami
  • 579
  • 6
  • 20
0
votes
0 answers

Optimizing FuzzyMatch in Talend

I am using Talend to check quality of data where I compare the names of the person of two databases. One database will have correct names and another database will have corrupted names. What I have to do is compare both names and find correct names…
Prakki
  • 149
  • 1
  • 3
  • 13
0
votes
1 answer

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1. I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data. The job runs as expected when the customer names are in…
0
votes
1 answer

Fuzzy matching by using SimMetrics library

I need some help here. How would, I create a simple SQL statement to select Names @userEnteredName with these functions. In other words, I want to get customer names from the customer table where the user typed in smyth and get back smith, smitty,…