An algorithm for measuring the similarity of two strings, often used for duplicate detection.
Questions tagged [jaro-winkler]
78 questions
0
votes
0 answers
Manual Calculation of Jaro Distance
I am attempting to validate the R function stringdist from library stringdist.
Using example
1 - stringdist('John J Smith', 'John Smith', method = 'jw', p = 0) it returns 0.9444444
Where p = 0 implies that the Winkler component of Jaro-Winkler is…

Scott
- 446
- 4
- 16
0
votes
1 answer
Creating New Matching Logic in Informatica (Ratcliffe - Obershelp)
I am conducting a matching project in Informatica 10.2.1 wherein I need to identify matching strings within product descriptions. Ratcliffe-Obershelp is the matching strategy I need to implement.
I've heard Ratcliffe-Obershelp yields greater results…
0
votes
0 answers
jaro-winkler with several lines
I want to calculate the similarity between several lines, I found the distance jaro-winkler but only with two string, how can I replace these two string with several lines (from note pad)?
0
votes
1 answer
How to check the similarity between two lists in two different excel files using python?
I have two lists containing customer names. The names can be similar or different. How to find the similarity between these two lists using python?
After having similarity I want to pull corresponding data from one excel file to other.
example:
List…

Akshay Gupta
- 11
- 1
0
votes
0 answers
Fuzzy match algorithm between full names from different manual inputs in TSQL?
I'm hoping to implement a fuzzy match algorithm in TSQL (without MDS) that compares full names. The names are coming from separate manual inputs with no controls over what's entered. One of the systems also tends to cut off the end of names as it…

user3457834
- 314
- 3
- 12
0
votes
0 answers
R string-based matching of business names
TL;DR I'd like to match two unequal columns where the values contain business names, and I've tried stringdist's amatch using Jaro-Winkler matching to get close, but not nearly close enough. I am wondering if stringi would be useful here - I just…

Amjad Talib
- 1
- 2
0
votes
1 answer
JarowinklerDistance in lucene is returning strange results
I have a file containing some phrases. Using jarowinkler by lucene, it is supposed to get me the most similar phrases of my input from that file.
Here is an example of my problem.
We have a file containing:
//phrases.txt
this is goodd
this is…

Remis07
- 367
- 2
- 5
- 14
0
votes
1 answer
utl_match comparing many records
I have 2 tables - one with 1 million records, and the other with 40000 records.
I need to compare for each record in a table if there's a similar string on the other table.
the thing is that this procedure is very slow
I need optimize this procedure…

Anthony Vasquez
- 39
- 6
0
votes
1 answer
Faster Search query with dynamic where columns on oracle db
I have a table(ResponseData) with columns RESPONSE_ID,RESPONSEDATA,KEY1,KEY2,KEY3,KEY4,VALUE1,VALUE2,VALUE3,VALUE4
user can insert data any of below category.
1,"my response one","name",null,null,null,"Apple",null,null,null
2, "my response…

snofty
- 70
- 7
0
votes
1 answer
Jaro Winkler in sql server
I tried to find the UDF dbo.fn_calculateJaroWinkler (for computing the Jaro Winkler distance) for sql server and couldn't find it. Does anyone wrote it and could share?

oder5
- 1
- 1
- 2
0
votes
1 answer
Winkler algorithm usage for web-forms
From a web form client sends me many variables such as name, surname, id, adress etc. Sometimes user sends me name like;
Elviz Aaronn Presley
With Winkler algorithm, i want to compare all records with DB records.
Elvis will be compared to…

Ali Arda Orhan
- 764
- 2
- 9
- 24
0
votes
1 answer
Replace word with lowest cost ,Jellyfish python
I have an entire list of words with the correct spellings called ref.txt . i have a list of sentences and i have managed to extract words from them using regex. i'll elaborate it with an example .
suppose ref.txt contains - Mumbai , Andheri ,Jacob…

Hypothetical Ninja
- 3,920
- 13
- 49
- 75
0
votes
1 answer
Fast Jaro Winkler c++ code for numeric vectors
Is there any library or the code of a function in C++ that I can use for comparing numeric vectors in C++?

POD
- 509
- 8
- 20
0
votes
1 answer
What is the third parameter to Text::JaroWinkler::strcmp95 for?
I am interested in the Jaro-Winkler module written in Perl to compute the distance (or similarity) between two strings:
http://search.cpan.org/~scw/Text-JaroWinkler-0.1/JaroWinkler.pm
The syntax of the function is not clear to me; I could not find…

paso
- 168
- 10
0
votes
1 answer
NLP - Improving Running Time and Recall of Fuzzy string matching
I have made a working algorithm but the running time is very horrible. Yes, I know from the start that it will be horrible but not that much. For just 200000 records, the program runs for more than an hour.
Basically what I am doing is:
for each…

MindSeeker
- 3
- 3