Questions tagged [jaro-winkler]

An algorithm for measuring the similarity of two strings, often used for duplicate detection.

78 questions
1
vote
1 answer

Jaro Winkler distance in Objective-C or Swift

I need to do fuzzy comparison of a large number of strings and am looking at Jaro-Winkler which respects differences in the order of letters. Is anyone aware of a way to do this in Objective-C or Swift either using Jaro-Winkler or some method native…
user6631314
  • 1,751
  • 1
  • 13
  • 44
1
vote
1 answer

Name matching R

I have 2 datasets with name. One with exact names and the other with exact and modified names dt_t <- data.table(Name = list("Aaron RAMSEY", "Mesut OEZIL", "Sergio AGUERO")) dt_f <- data.table(Name = list("Özil Mesut", "Ramsey Aaron", "Kun…
P. Vauclin
  • 367
  • 1
  • 2
  • 10
1
vote
3 answers

Match Names of the Companies approximately

I have 12 Million company names in my db. I want to match them with a list offline. I want to know the best algorithm to do so. I have done that through Levenstiens distance but it is not giving the expected results. Could you please suggest some…
1
vote
1 answer

Text Mining using Jaro-Winkler fuzzy matching in R

Im attempting to do some distance matching in R and am struggling to achieve a usable output. I have a dataframe terms that contains 5 strings of text, along with a category for each string. I have a second dataframe notes that contains 10 poorly…
Cam23 19
  • 19
  • 6
1
vote
0 answers

String similarity where order and difference in ascii code matters

Anybody aware of a string similarity method that would give the correct results for the below? I'm dealing with alphanumeric IDs where: a change in the early part of the string matters more than in the latter part. I guess I could do ngrams?…
citynorman
  • 4,918
  • 3
  • 38
  • 39
1
vote
1 answer

how do you make a string dictionary function in lua?

Is there a way if a string is close to a string in a table it will replace it with the one in the table? Like a spellcheck function, that searches through a table and if the input is close to one in the table it will fix it , so the one in the table…
1
vote
1 answer

An implementation of the Jaro Winkler distance algorithm in Transact SQL

I've been wondering for months about how to implement this algorithm in Transact SQL, https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance How can it be done?
Maritim
  • 2,111
  • 4
  • 29
  • 59
1
vote
1 answer

How to compute Overlap Coeffecient and Jaro Winkler using Simmetrics java

I have been trying to use the Sim-metrics library from: com.github.mpkorstanje simmetrics-core 4.1.0 So far I am computing…
VSEWHGHP
  • 195
  • 2
  • 3
  • 12
1
vote
1 answer

Jaro Similarity

For finding Jaro similarity I found the matching charecters as follows matching charecters in string 1 : AABABCAAAC matching charecters in string 2 : ABAACBAAAC what is the value of t(0.5*transpositions)? (source: wikipedia)
Curious
  • 133
  • 2
  • 9
1
vote
1 answer

Search Recommandation / Suggestion on large database

I have table with millions of rows. Now when User make any spelling mistake while searching string or word from table, I want to recommend user correct word or string from table. I am using jaro-winkler algorithm to compare distance of string, but…
JP711
  • 93
  • 1
  • 10
1
vote
0 answers

JARO_WINKLER matching date`s as string

So, im used to use jaro_winkler and else in order to match stings that our customers provides us and using those pcts to find the customers at our database since we dont really have a Key as most of the other places have, like SSN, SIN, CPF and…
1
vote
1 answer

Interpreting the Jaro-Winkler Score in Perl -- Are there Alternatives in Stata?

Is there an industry standard for how large the Jaro-Winkler score should be to say that the two strings are likely similar? I have a list of strings and I want to see if any of them are plausible typographical errors for the name James. I have…
paso
  • 168
  • 10
0
votes
0 answers

Postgres function works in pgAdmin but not via JDBC

I have installed the PostgreSQL pg_simialrity extension as described here so that I can use the jarowinkler function. The function works perfectly in pgAdmin but not via Spring JDBC. When executed via JDBC this error is…
SME
  • 489
  • 1
  • 10
  • 21
0
votes
1 answer

Applying Jaro-Winkler distance to two dataframes

I have two dataframes of unequal length and would like to compare the similarity of strings in df2 with df1. Is it possible to apply Jaro-Winkler distance method to calculate the string similarity on two dataframes through map/lambda…
rshar
  • 1,381
  • 10
  • 28
0
votes
1 answer

I am working on Jaro wrinkler similarity, and I am able to use between 2 columns, but how do I use it with 2 pairs of columns

Example i have 4 column in my dataframe, i want to use jaro similarity for col: A,B vs col: C,D containing strings Currently i am using it between 2 columns using df.apply(lambda x: textdistance.jaro(x[A], x[C]),axis = 1)) Currently i was comparing…
Kevin D
  • 1
  • 1