Questions tagged [jaro-winkler]

An algorithm for measuring the similarity of two strings, often used for duplicate detection.

78 questions
0
votes
0 answers

Optimizing Jaro-Winkler algorithm in Golang

I need to run Jaro Wrinkler 1500000 times for finding similarity between given []byte to exiting []byte. I took the code from https://rosettacode.org/wiki/Jaro-Winkler_distance#Go and modified it to work directly on []byte instead of string. func…
Vishal Jangid
  • 358
  • 1
  • 10
0
votes
0 answers

Pandas Dataframe - Ambiguous

I'm trying to use some code the runs the Jaro Winkler function to compare the similiarity of two strings. If I just hard code in two values, john and jon, I get no problems using the logic below. However what I want is to use a csv file and compare…
0
votes
0 answers

Recommended String Metric Algorithm for string detection?

We're trying to choose a string metric algorithm for our string comparison program. Which would be the best string metric algorithm if we want to detect misspellings and alteration of the word like changing letters to words or symbols, adding extra…
0
votes
1 answer

EF Core with Jaro-Winkler similarity distance algorithm

I want to search for strings in my db with some similarity distance algorithm like Jaro-Winkler. However EF Core cannot translate such expressions. So you cannot use an expression like below: query.Where(x => JaroWinkler.Similarity(x.Title,…
0
votes
1 answer

How to compare similarity between two strings (other than English language) in Python

I want to find the similarity between the two strings Example string1 = "One" string2 = "one" And I expect the answer to be between 0 and 1. For the above two strings, we get 1. Right now I'm using "Jellyfish", a module in python which has the…
0
votes
1 answer

Oracle fuzzy searching with UTL functions

I need to implement fuzzy search on database layer, but I am having some minor issues. Here is my SQL code for demonstration : SELECT * FROM (SELECT * FROM TOOLS WHERE UTL_MATCH.jaro_winkler_similarity(UPPER('sample tool'), UPPER(NAME)) >…
0
votes
1 answer

JaroWinkler Method --> Identifying Character/Numeric spots in a string

I am working on a problem to identify if a specified string has the correct format. I am attempting to use a fuzzy matching technique, JaroWinkler, to find the similarity score between a reference string and the strings of interest. The correct…
user2813606
  • 797
  • 2
  • 13
  • 37
0
votes
1 answer

Checking and Removing NoneTypes for Jaro String Similarity

I'm trying to discern the string similarity between two strings (using Jaro). Each string resides in a separate column in my dataframe. String 1 = df['name_one'] String 2 = df['name_two'] When I try to run my string similarity logic: from…
mikelowry
  • 1,307
  • 4
  • 21
  • 43
0
votes
1 answer

string matching - best distance algorithm to use

I have two dataframes, df1 and df2, that have information about polling stations. The dataframes are of different lengths. Both dataframes have a column called ps_name, which is the name of the polling stations, and a column called district that…
dmswjd
  • 43
  • 7
0
votes
1 answer

'jarowinkler is not defined' after I run the code correctly the day before

I am getting the following error message: NameError: name 'jarowinkler' is not defined This error comes from from similarity.jarowinkler import JaroWinkler for word in words: df[word] = df.Texts.apply(lambda x: jarowinkler.similarity(x,…
user12907213
0
votes
1 answer

RecordLinkage Package jarowinkler no output

I'm working with package version 0.4-12 and R version 4.0.0 My data linkage code that I have used in the past in no longer running the same as it did when I had R version…
Chris B
  • 11
  • 2
0
votes
1 answer

How to work with gigantic list with jaro-winkler similarity check

I have a list with more than 10 million strings that I need to iterate and get scored percentage points while using a similarity function. I do this by getting an item from another list that will be used to check similarity from the giga list as…
lobjc
  • 2,751
  • 5
  • 24
  • 30
0
votes
1 answer

Apply Python function to each row and append

I have the following data : I am trying to use the library - pyjarowinkler and find the distance between strings - my hello world code works #Hello World d1=distance.get_jaro_distance("Hello","hello", winkler=True, scaling=0.1); d1 When I try to…
sai
  • 313
  • 2
  • 7
  • 19
0
votes
0 answers

Distance/Fuzzy matching 2 columns with another 2 columns in R

in my simplified example I have a dataframe with four different columns. I want to be able to match main_name and main_dob together with secondary_name and secondary_dob. The actual order of the rows doesn't matter, so if there is a match in row 3…
Joey
  • 1
0
votes
0 answers

Which string distance algorithm to detect tiny changes?

I'm working on a phishing email filter project and as the first step to guess if an email is phishing or not, without using external APIs, I want to compare the visible text and the underlying URL of the links. e.g.:
1Z10
  • 2,801
  • 7
  • 33
  • 82