Questions tagged [jaro-winkler]

An algorithm for measuring the similarity of two strings, often used for duplicate detection.

78 questions
0
votes
0 answers

Manual Calculation of Jaro Distance

I am attempting to validate the R function stringdist from library stringdist. Using example 1 - stringdist('John J Smith', 'John Smith', method = 'jw', p = 0) it returns 0.9444444 Where p = 0 implies that the Winkler component of Jaro-Winkler is…
Scott
  • 446
  • 4
  • 16
0
votes
1 answer

Creating New Matching Logic in Informatica (Ratcliffe - Obershelp)

I am conducting a matching project in Informatica 10.2.1 wherein I need to identify matching strings within product descriptions. Ratcliffe-Obershelp is the matching strategy I need to implement. I've heard Ratcliffe-Obershelp yields greater results…
0
votes
0 answers

jaro-winkler with several lines

I want to calculate the similarity between several lines, I found the distance jaro-winkler but only with two string, how can I replace these two string with several lines (from note pad)?
0
votes
1 answer

How to check the similarity between two lists in two different excel files using python?

I have two lists containing customer names. The names can be similar or different. How to find the similarity between these two lists using python? After having similarity I want to pull corresponding data from one excel file to other. example: List…
0
votes
0 answers

Fuzzy match algorithm between full names from different manual inputs in TSQL?

I'm hoping to implement a fuzzy match algorithm in TSQL (without MDS) that compares full names. The names are coming from separate manual inputs with no controls over what's entered. One of the systems also tends to cut off the end of names as it…
0
votes
0 answers

R string-based matching of business names

TL;DR I'd like to match two unequal columns where the values contain business names, and I've tried stringdist's amatch using Jaro-Winkler matching to get close, but not nearly close enough. I am wondering if stringi would be useful here - I just…
0
votes
1 answer

JarowinklerDistance in lucene is returning strange results

I have a file containing some phrases. Using jarowinkler by lucene, it is supposed to get me the most similar phrases of my input from that file. Here is an example of my problem. We have a file containing: //phrases.txt this is goodd this is…
Remis07
  • 367
  • 2
  • 5
  • 14
0
votes
1 answer

utl_match comparing many records

I have 2 tables - one with 1 million records, and the other with 40000 records. I need to compare for each record in a table if there's a similar string on the other table. the thing is that this procedure is very slow I need optimize this procedure…
0
votes
1 answer

Faster Search query with dynamic where columns on oracle db

I have a table(ResponseData) with columns RESPONSE_ID,RESPONSEDATA,KEY1,KEY2,KEY3,KEY4,VALUE1,VALUE2,VALUE3,VALUE4 user can insert data any of below category. 1,"my response one","name",null,null,null,"Apple",null,null,null 2, "my response…
snofty
  • 70
  • 7
0
votes
1 answer

Jaro Winkler in sql server

I tried to find the UDF dbo.fn_calculateJaroWinkler (for computing the Jaro Winkler distance) for sql server and couldn't find it. Does anyone wrote it and could share?
oder5
  • 1
  • 1
  • 2
0
votes
1 answer

Winkler algorithm usage for web-forms

From a web form client sends me many variables such as name, surname, id, adress etc. Sometimes user sends me name like; Elviz Aaronn Presley With Winkler algorithm, i want to compare all records with DB records. Elvis will be compared to…
Ali Arda Orhan
  • 764
  • 2
  • 9
  • 24
0
votes
1 answer

Replace word with lowest cost ,Jellyfish python

I have an entire list of words with the correct spellings called ref.txt . i have a list of sentences and i have managed to extract words from them using regex. i'll elaborate it with an example . suppose ref.txt contains - Mumbai , Andheri ,Jacob…
Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75
0
votes
1 answer

Fast Jaro Winkler c++ code for numeric vectors

Is there any library or the code of a function in C++ that I can use for comparing numeric vectors in C++?
POD
  • 509
  • 8
  • 20
0
votes
1 answer

What is the third parameter to Text::JaroWinkler::strcmp95 for?

I am interested in the Jaro-Winkler module written in Perl to compute the distance (or similarity) between two strings: http://search.cpan.org/~scw/Text-JaroWinkler-0.1/JaroWinkler.pm The syntax of the function is not clear to me; I could not find…
paso
  • 168
  • 10
0
votes
1 answer

NLP - Improving Running Time and Recall of Fuzzy string matching

I have made a working algorithm but the running time is very horrible. Yes, I know from the start that it will be horrible but not that much. For just 200000 records, the program runs for more than an hour. Basically what I am doing is: for each…