0

I want to find out one string with some Levenshtein distance inside bigger string. I have written the code for finding the distance between two string but want to efficiently implement when i want to find some substring with fixed Levenshtein distance.

module Levenshtein

  def self.distance(a, b)
    a, b = a.downcase, b.downcase
    costs = Array(0..b.length) # i == 0
    (1..a.length).each do |i|
      costs[0], nw = i, i - 1  # j == 0; nw is lev(i-1, j)
      (1..b.length).each do |j|
        costs[j], nw = [costs[j] + 1, costs[j-1] + 1, a[i-1] == b[j-1] ? nw : nw + 1].min, costs[j]
      end
    end
    costs[b.length]
  end

  def self.test
    %w{kitten sitting saturday sunday rosettacode raisethysword}.each_slice(2) do |a, b|
      puts "distance(#{a}, #{b}) = #{distance(a, b)}"
    end
  end

end
LearningBasics
  • 660
  • 1
  • 7
  • 24

1 Answers1

0

Check at the TRE library, which does exactly this (in C), and quite efficienly. Now look carefully at the matching function, which is basically 500 lines of unreadable (but necessary) code.

I'd say that, instead of rolling your own version and provided you don't intend to read all the much difficult papers on the subject (search for "approximate string matching") and don't have a few free months for studying the subject, you'd be much better of writing a small wrapper around the library itself. Your Ruby version would be inefficient anyway in comparison with what can be obtained in C.

michaelmeyer
  • 7,985
  • 7
  • 30
  • 36