Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
1 answer

MySQL query issue with combining two tables

I have two tables: `search_chat` ( `pubchatid` varchar(255) NOT NULL, `profile` varchar(255) DEFAULT NULL, `prefs` varchar(255) DEFAULT NULL, `init` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `session`…
Matt
  • 1,081
  • 15
  • 27
0
votes
1 answer

Calculation of Levenshtein distance

I'm not sure whether this question is repeated or not.But,I know like to know more about the optimized Levenshtein Distance Algorithm Implementation in R or Java or Python.I have a Text File which contains numerous strings line by line(close to 2000…
maddy
  • 113
  • 2
  • 8
0
votes
0 answers

Edit distance or metaphone?

I am working on online review data, that is full of "Internet lingo". I want to do some lexicon analysis on the words. Long story short I want a spell checker that can take into account the language used in internet. After some research I, I found…
0
votes
1 answer

Mysql Optimized Levenshtein

I'm trying to create an optimized levenshtein function in mysql. I can't find my error, my console returns me this: "#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax…
0
votes
1 answer

Levenshtein cost settings

I've been asked to guess the user intention when part of expected data is missing. For example if I'm looking to get very well or not very well but I get only not instead, then I should flag it as not very well. The Levenshtein distance for not and…
Mahdi
  • 9,247
  • 9
  • 53
  • 74
0
votes
2 answers

fuzzy matching word on OCR page

I have a static phrase the I am searching an OCR'd image for. string KeywordToFind = "Account Number" string OcrPageText = " GEORGIA POWER A SOUTHERN COMPANY AecountNumber 122- 493 Pagel of2 Please Pay By Jan 29,2014 Total Due 39.11 " How…
Milne
  • 850
  • 1
  • 11
  • 28
0
votes
2 answers

Find near-duplicates of comma-separated lists using Levenshtein distance

This question based on the answer of my question yesterday. To solve my problem, Jean-François Corbett suggested a Levenshtein distance approach. Then I found this code somewhere to get Levenshtein distance percentage. Public Function…
Alfa Bachtiar
  • 73
  • 1
  • 8
0
votes
1 answer

Difference between 2 strings

I want to compare some strings like this Previous -> Present Something like path 1 : 100 -> 112 --> 333 --> 500 path 2 : 100 -> 333 --> 500 path 3 : 100 -> 333 --> 500 --> 500 path 4 : 100 -> 112 --> 500 I need to compare path 1 with path 2, get…
user3066913
0
votes
1 answer

Returning multiple matches with levenshtein

How can I return multiple similar matches with levenshtein in PHP? For example if I have this code: $input = "Hello what is your name. My namer is Jack."; $output = "nam"; echo levenshtein($input, $output); It must not only output name, but name,…
user3130221
0
votes
0 answers

string-comparison algorithm for user defined dictionary

In my project the user input is translated to a string in the form of [x,x,x,x,x,x,x,x,x,x] where x is a number between 1 and 8 and stored to a library of this type of strings. Later I have to compare the user new input with every string in that…
0
votes
1 answer

How to use libraries like pylevenshtein after download?

How should I go about using a python library that can be downloaded (eg. in the link below)? Haven't used any specific direct written libraries in python before. http://code.google.com/p/pylevenshtein/
Anuj
  • 6,987
  • 3
  • 22
  • 25
0
votes
2 answers

How do I count modified lines of code?

I have a program which counts lines of code (excluding comments, braces, whitespace, etc.) of two programs then compares them. It puts all the lines from one program in one List and the lines from the other program in another List. It then removes…
JDCAce
  • 159
  • 12
0
votes
2 answers

Conditional for loop in R not recognizing conditional statement?

Assume that I have the following similar data structure, where doc_id is the document identifier, text_id is the unique text/version identifier and text is a character string: df <- cbind(doc_id=as.numeric(c(1, 1, 2, 2, 3, 4, 4, 4, 5, 6)), …
0
votes
2 answers

Damerau-Levenshtein distance code throwing errors?

For some reason, when I try and implement the following code (I'm using Sublime Text 2) it gives me the error "Invalid Syntax" on line 18. I'm not sure why this is, I found the code here and it apparently should work, so I have no idea why it…
0
votes
1 answer

Lucene fuzzy phrase search approach with scoring

My requirement is to generate match score on fuzzy phrase search. Example 1) Input Data - Hello Sam, how are you doing? Thanks, Smith. Indexed Document - Sam Smith (documents are always person/organization names and input data would be free-text…
Rushik
  • 1,121
  • 1
  • 11
  • 34