Questions tagged [suffix-tree]

A suffix tree is a data structure that stores all suffixes of a string. It is the basis for many fast algorithms on strings.

228 questions
8
votes
1 answer

substring finding from a string

Input: string S = AAGATATGATAGGAT. Output: Maximal repeats such as GATA (as in positions 3 and 8), GAT (as in position 3, 8 and 13) and so on... A maximal repeat is a substring t occurs k>1 times in S, and if t is extended to left or right, it will…
rock
  • 153
  • 8
8
votes
4 answers

Suffix trees in javascript?

Is there a nice implementation of suffix trees in JavaScript? Something that will take a string (and a separator) and make the appropriate suffix tree?
silverasm
  • 501
  • 5
  • 10
8
votes
1 answer

Matches overlapping lookahead on LZ77/LZSS with suffix trees

Background: I have an implementation of a generic LZSS backend on C++ (available here. The matching algorithm I use in this version is exceedingly simple, because it was originally meant to compress relatively small files (at most 64kB) for…
8
votes
3 answers

Short, Java implementation of a suffix tree and usage?

I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines long and spans several classes. Ideally, the…
Stefan Kendall
  • 66,414
  • 68
  • 253
  • 406
8
votes
9 answers

Efficient String/Pattern Matching in C++ (suffixarray, trie, suffixtree?)

I'm looking for an efficient data structure to do String/Pattern Matching on an really huge set of strings. I've found out about tries, suffix-trees and suffix-arrays. But I couldn't find an ready-to-use implementation in C/C++ so far (and…
Constantin
  • 8,721
  • 13
  • 75
  • 126
7
votes
4 answers

Optimizing construction of a trie over all substrings

I am solving a trie related problem. There is a set of strings S. I have to create a trie over all substrings for each string in S. I am using the following routine: String strings[] = { ... }; // array containing all strings for(int i = 0; i <…
Bhoot
  • 2,614
  • 1
  • 19
  • 36
7
votes
2 answers

How to remove substring from suffix tree?

I reviewed a lot of literature, but I dont found any information about deleting or insertion substrings into suffix tree. There are only Ukkonen's or McCreight's algorithms for building tree. The poorest way is to rebuild tree after deleting or…
6
votes
0 answers

Stream variant of the Longest palindromic substring

Suppose I have a character stream as my input. What is the most optimal way to find the longest palindromic substring after each new character is added without reprocessing the whole string all over again? After each new character comes in, I…
user78706
6
votes
0 answers

Haskell Data Type With References

I'm implementing Ukkonen's algorithm, which requires that all leaves of a tree contain a reference to the same integer, and I'm doing it in Haskell to learn more about the language. However, I'm having a hard time writing out a data type that does…
Craig
  • 255
  • 1
  • 6
6
votes
1 answer

Find longest common substring of multiple strings using factor oracle enhanced with LRS array

Can we use a factor-oracle with suffix link (paper here) to compute the longest common substring of multiple strings? Here, substring means any part of the original string. For example "abc" is the substring of "ffabcgg", while "abg" is not. I've…
Ray
  • 1,647
  • 13
  • 16
6
votes
1 answer

Can I generate all substrings in complexity < O(n^2)

Currently I am using two nested for loop to generate all the substrings of a string. I heard about Suffix Tree but AFAIK Suffix Tree generates suffix not the substrings. Following is the code which currently i am using- String s =…
ravi
  • 6,140
  • 18
  • 77
  • 154
6
votes
4 answers

Working with suffix trees in python

I'm relatively new to python and am starting to work with suffix trees. I can build them, but I'm running into a memory issue when the string gets large. I know that they can be used to work with DNA strings of size 4^10 or 4^12, but whenever I…
doggysaywhat
  • 177
  • 1
  • 2
  • 6
5
votes
1 answer

Finding all common, non-overlapping substrings

Given two strings, I would like to identify all common sub-strings from longest to shortest. I want to remove any "sub-"sub-strings. As an example, any substrings of '1234' would not be included in the match between '12345' and '51234'. string1 =…
mrmagicfluffyman
  • 365
  • 1
  • 2
  • 7
5
votes
1 answer

How is worst case time complexity of constructing suffix tree linear?

I have trouble understanding how the worst case time complexity of constructing a suffix tree is linear - particularly when we need to build a suffix tree for a string that may be composed of repeating single character such as "aaaaa". Even if I…
5
votes
4 answers

How to speed up calculation of length of longest common substring?

I have two very large strings and I am trying to find out their Longest Common Substring. One way is using suffix trees (supposed to have a very good complexity, though a complex implementation), and the another is the dynamic programming method…
Lazer
  • 90,700
  • 113
  • 281
  • 364
1 2
3
15 16