An algorithm for measuring the similarity of two strings, often used for duplicate detection.
Questions tagged [jaro-winkler]
78 questions
1
vote
1 answer
Jaro Winkler distance in Objective-C or Swift
I need to do fuzzy comparison of a large number of strings and am looking at Jaro-Winkler which respects differences in the order of letters. Is anyone aware of a way to do this in Objective-C or Swift either using Jaro-Winkler or some method native…

user6631314
- 1,751
- 1
- 13
- 44
1
vote
1 answer
Name matching R
I have 2 datasets with name. One with exact names and the other with exact and modified names
dt_t <- data.table(Name = list("Aaron RAMSEY", "Mesut OEZIL", "Sergio AGUERO"))
dt_f <- data.table(Name = list("Özil Mesut", "Ramsey Aaron", "Kun…

P. Vauclin
- 367
- 1
- 2
- 10
1
vote
3 answers
Match Names of the Companies approximately
I have 12 Million company names in my db. I want to match them with a list offline.
I want to know the best algorithm to do so. I have done that through Levenstiens distance but it is not giving the expected results. Could you please suggest some…

shashank
- 400
- 8
- 25
1
vote
1 answer
Text Mining using Jaro-Winkler fuzzy matching in R
Im attempting to do some distance matching in R and am struggling to achieve a usable output.
I have a dataframe terms that contains 5 strings of text, along with a category for each string. I have a second dataframe notes that contains 10 poorly…

Cam23 19
- 19
- 6
1
vote
0 answers
String similarity where order and difference in ascii code matters
Anybody aware of a string similarity method that would give the correct results for the below? I'm dealing with alphanumeric IDs where:
a change in the early part of the string matters more than in the latter part. I guess I could do ngrams?…

citynorman
- 4,918
- 3
- 38
- 39
1
vote
1 answer
how do you make a string dictionary function in lua?
Is there a way if a string is close to a string in a table it will replace it with the one in the table?
Like a spellcheck function, that searches through a table and if the input is close to one in the table it will fix it , so the one in the table…

joshua chris
- 55
- 12
1
vote
1 answer
An implementation of the Jaro Winkler distance algorithm in Transact SQL
I've been wondering for months about how to implement this algorithm in Transact SQL, https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
How can it be done?

Maritim
- 2,111
- 4
- 29
- 59
1
vote
1 answer
How to compute Overlap Coeffecient and Jaro Winkler using Simmetrics java
I have been trying to use the Sim-metrics library from:
com.github.mpkorstanje
simmetrics-core
4.1.0
So far I am computing…

VSEWHGHP
- 195
- 2
- 3
- 12
1
vote
1 answer
Jaro Similarity
For finding Jaro similarity I found the matching charecters as follows
matching charecters in string 1 : AABABCAAAC
matching charecters in string 2 : ABAACBAAAC
what is the value of t(0.5*transpositions)?
(source: wikipedia)

Curious
- 133
- 2
- 9
1
vote
1 answer
Search Recommandation / Suggestion on large database
I have table with millions of rows. Now when User make any spelling mistake while searching string or word from table, I want to recommend user correct word or string from table. I am using jaro-winkler algorithm to compare distance of string, but…

JP711
- 93
- 1
- 10
1
vote
0 answers
JARO_WINKLER matching date`s as string
So, im used to use jaro_winkler and else in order to match stings that our customers provides us and using those pcts to find the customers at our database since we dont really have a Key as most of the other places have, like SSN, SIN, CPF and…

Ytipsh
- 37
- 2
- 10
1
vote
1 answer
Interpreting the Jaro-Winkler Score in Perl -- Are there Alternatives in Stata?
Is there an industry standard for how large the Jaro-Winkler score should be to say that the two strings are likely similar?
I have a list of strings and I want to see if any of them are plausible typographical errors for the name James. I have…

paso
- 168
- 10
0
votes
0 answers
Postgres function works in pgAdmin but not via JDBC
I have installed the PostgreSQL pg_simialrity extension as described here so that I can use the jarowinkler function. The function works perfectly in pgAdmin but not via Spring JDBC. When executed via JDBC this error is…

SME
- 489
- 1
- 10
- 21
0
votes
1 answer
Applying Jaro-Winkler distance to two dataframes
I have two dataframes of unequal length and would like to compare the similarity of strings in df2 with df1. Is it possible to apply Jaro-Winkler distance method to calculate the string similarity on two dataframes through map/lambda…

rshar
- 1,381
- 10
- 28
0
votes
1 answer
I am working on Jaro wrinkler similarity, and I am able to use between 2 columns, but how do I use it with 2 pairs of columns
Example i have 4 column in my dataframe,
i want to use jaro similarity for col: A,B vs col: C,D containing strings
Currently i am using it between 2 columns using
df.apply(lambda x: textdistance.jaro(x[A], x[C]),axis = 1))
Currently i was comparing…

Kevin D
- 1
- 1