0

I've built a trie data structure that looks like this:

struct Trie<Element : Hashable> : Equatable {
    private var children: [Element: Trie<Element>]
    private var endHere: Bool
}

to perform autocorrection operations on input from a UITextField. I gave the trie a variety of functions such as insert:

/**
 Private insert function. Inserts an elements into a trie using a sequences' generator.

 - parameter g: `GeneratorType`.
 */
private mutating func insert<G: GeneratorType where G.Element == Element>(g: G) {
    var gen = g
    if let head = gen.next() {
        if case nil = children[head]?.insert(gen) {
            children[head] = Trie(g: gen)
        }
    } else {
        endHere = true
    }
}

/**
 Insert elements into the trie.

 - parameter seq: Sequence of elements.
 */
mutating func insert<S: SequenceType where S.Generator.Element == Element>(seq: S) {
    insert(seq.generate())
}

the necessary initializers:

/**
 Create an empty trie.
 */
init() {
    children = [:]
    endHere  = false
}

/**
 Initialize a trie with a generator.

 - parameter g: `GeneratorType`.
 */
private init<G: GeneratorType where G.Element == Element>(g: G) {
    var gen = g
    if let head = gen.next() {
        (children, endHere) = ([head:Trie(g: gen)], false)
    } else {
        (children, endHere) = ([:], true)
    }
}

/**
 Construct from an arbitrary sequence of sequences with elements of type `Element`.

 - parameter s: Sequence of sequences.
 */
init<S: SequenceType, Inner: SequenceType where S.Generator.Element == Inner, Inner.Generator.Element == Element>(_ s: S) {
    self.init()
    s.forEach { insert($0) }
}

/**
 Construct a trie from a sequence of elements.

 - parameter s: Sequence.
 */
init <S: SequenceType where S.Generator.Element == Element>(_ s: S) {
    self.init(g: s.generate())
}

and conformed Trie to SequenceType so that I can iterate through the elements.

Now, I want to implement a levenshtein distance search where the search function would look like:

func search<S: SequenceType where S.Generator.Element == Element(s: S, maxDistance: Int = 0) -> [(S, Int)] {

}

where the return value is a list of matched subsequences found and max distance it was away from the original query sequence but this is where my knowledge is a bit lacking. I'm not sure how to actually perform the search on my trie and build up a list of matched sequences while calculating the insertion, deletion, and replacement cost.

barndog
  • 6,975
  • 8
  • 53
  • 105
  • Take a look here (links below are better): https://gist.github.com/bgreenlee/52d93a1d8fa1b8c1f38b – sschale Jun 02 '16 at 06:13
  • What about applying that search while recursing down branches of the trie? That's mostly what I'm stuck on. – barndog Jun 02 '16 at 07:06

1 Answers1

1

The solution to this is nontrivial, but take a look at the paper, Fast String Correction with Levenshtein-Automata. You would treat your trie as the dictionary automaton, which is intersected with a Levenshtein automaton. A search strategy is used to follow just the paths along the intersection that lead to terms with Levenshtein distances (from the query term) no greater than the specified threshold.

As a reference, liblevenshtein has an implementation in Java. For the logic pertaining to searching the trie, look in src/main/java/com/github/liblevenshtein/transducer.

Dylon
  • 1,730
  • 15
  • 14