How are BSTs deployed as dictionaries for string search?

Question

link:https://www8.cs.umu.se/kurser/TDBAfl/VT06/algorithms/BOOK/BOOK/NODE39.HTM

In the following problem the author discusses using BST as dictionary for length-k substrings. Can anyone explain how he does so in the problems context.

score 0 · Accepted Answer · answered Jun 19 '19 at 15:53

Having read this over it looks like they were simply dumping all the length-k substrings into a standard BST. The ordering of the items is determined using lexicographical ordering (compare characters one at a time until a mismatch is found, then decide the outcome of the comparison based on which string’s character compares lower than the other’s).

To check if a new length 2k string was valid, they ran a sliding window of length k over it, checking to see whether each length k substring was in the BST. If not, they could reject the length 2k string and move on to the next candidate. This would take time O(k² log n), where n is the total number of length-k substrings, as each BST lookup takes time O(k log n) (O(k) substrings looked up, with each lookup retiring O(log n) comparisons at a cost of O(k) each).

The faster solution they described at the end used suffix trees augmented with suffix links to speed the searches up by using the fact that each search was formed by dropping the first character of the last search and appending some new character.

Makes sense now. But wouldn't this mean that they'd have to generate all possible substrings(concatenations) initially at O(n^2) — rajat008, Jun 19 '19 at 17:20
Yep - based on what the article says I think that’s exactly what they did. — templatetypedef, Jun 19 '19 at 17:32

How are BSTs deployed as dictionaries for string search?

1 Answers1