An algorithm for measuring the similarity of two strings, often used for duplicate detection.
Questions tagged [jaro-winkler]
78 questions
3
votes
0 answers
ElasticSearch using Jaro-Winkler & Levenstein algorithm
I'm trying to use ElasticSearch as a data store to find some people by their name.
I've tried creating an index, I added words, changed mapping but when I'm trying to find people by name with the JaroWinkler & Levenstein algorithm, it gives nothing…

Oleg Kuzminsky
- 31
- 3
3
votes
2 answers
Jaro-winkler function: why is the same score matching very similar and very different words?
I am using the jaro-winkler fuzzy matching to match names.
I am trying to determine a cut-off range for the similarity score. If the names are too different, I want to exclude them for manual review.
While anything below .4 seemed to be different…

akline
- 31
- 1
- 2
3
votes
1 answer
Speeding up loop calculating Jaro-Winkler distance in R
I'm new here in more than one sense. First post regarding my first script in my first attempt of aquainting any programming language. In the light of that you might find this project to be overly ambitious, but hey, learning by doing has always been…

Morten Nielsen
- 325
- 2
- 4
- 19
3
votes
0 answers
How to match Amazon / CJ / Linkshare Products
I need to create a data base with Amazon, commission junction & link share API's & data feeds and then match the same products to create comparisons on product information.
My problem is related to the matching process.
I start by matching…

Smail
- 33
- 4
2
votes
3 answers
Jaro-Winkler string comparison function in SAS
Is there an implementation of the Jaro-Winkler string comparison in SAS?
It looks like Link King has Jaro-Winkler, but I'd prefer the flexibility of calling the function myself.
Thanks!

Richard Herron
- 9,760
- 12
- 69
- 116
2
votes
2 answers
Compare and link strings with different word orders / word counts
I am trying to use the recordLinkage package to link together two datasets where one dataset tends to give multiple last / middle names and the other just gives a single last name. Currently the string comparison function that's being used is the…

Maharero
- 238
- 1
- 10
2
votes
0 answers
matching text with speech to text arabic
I made a speech to text applications Arabic. the result of the speech text will be compared to the existing text in the array. with string algorithms macthing Jaro-Winkler distance
I've been counting the manual of all text input with text that is in…

Khairun Nufus
- 11
- 5
2
votes
0 answers
What is a sensible way to combine multiple Jaro-Winkler calculations?
Let's say I am comparing two individuals, each with a first name, last name, postal code, address(line1), address(line2), and phone number. These all have varying reliability and importance for determining a match.
I can generate a J-W distance for…

Daniel Paczuski Bak
- 3,720
- 8
- 32
- 78
2
votes
2 answers
Doing order by using the Jaro-Winkler distance algorithm?
I am wondering how would I be able to run a SQLite order by in this manner
select * from contacts order by jarowinkler(contacts.name,'john smith');
I know Android has a bottleneck with user defined functions, do I have an alternative?

Pentium10
- 204,586
- 122
- 423
- 502
2
votes
0 answers
Memory-efficient string comparison with blocking in R
I have a record linkage problem with very large datasets(2000 entries in the A-file, ~70.000.000 entries in the B-file) and want to do a distance-based matching with the jarow-winkler algorithm in R. Both files are data.tables filled with…

C Krüger
- 21
- 2
2
votes
1 answer
Fast Levenshtein Distance (and Jaro Winkler) in R for numeric vectors
Is there a packagein R that contain Levenshtein Distance counting function that compute the distance for numeric vectors? All I have found are strings based. Also I am looking for a Jaro-Winkler package that do the same, but the Levenshtein distance…

POD
- 509
- 8
- 20
1
vote
2 answers
FIRST() and LAST() for MATCH_RECOGNIZE
We are analyzing the streaming twitter data to find users who are posting similar (almost same) tweets over and over. I am using MATCH_RECOGNIZE for this. It is able to find the pattern, but I am not able to get the FIRST() and the LAST() values…

Saqib Ali
- 3,953
- 10
- 55
- 100
1
vote
1 answer
poetry error "'setup.py' [...] not found" when it exists
I'm migrating my packaging tool for a Python project from pipenv to poetry.
However, when attempting to install jaro-winkler (using poetry add jaro-winkler), I get the following error:
• Installing jaro-winkler (2.0.1.linux-x86_64): Failed
…

Ian
- 3,605
- 4
- 31
- 66
1
vote
0 answers
Computing JaroWinkler Similarity for unordered and different sized dataframes
I have two dataframes extracted from two attached files.
I want to compute JaroWinkler Similarity for tokens inside the files. I am using below code.
from similarity.jarowinkler import JaroWinkler
jarowinkler = JaroWinkler()
df_gt['jarowinkler_sim']…

Pert8S
- 582
- 3
- 6
- 21
1
vote
1 answer
Applying Jaro-Winkler distance to dataframe
I have dataframe of two columns. First one is correct strings, second is corrupted. I wanna apply Jaro-Winkler distance and store it in the new third column.
import pandas as pd
from pyjarowinkler.distance import get_jaro_distance
df =…

Arthur
- 13
- 1
- 3