Questions tagged [suffix-tree]

A suffix tree is a data structure that stores all suffixes of a string. It is the basis for many fast algorithms on strings.

228 questions
1
vote
1 answer

Generating suffixes from a Suffix Tree

I've built a suffix tree in Java based on the site here http://marknelson.us/1996/08/01/suffix-trees/ but I've run into a problem. I can build a suffix tree fine but I can trying to build a set of all suffixes from the tree. I basically find all the…
Justin
  • 4,196
  • 4
  • 24
  • 48
0
votes
1 answer

Concurrent insertions in a Suffix Tree

Some time ago I posted a question about saving/retrieving a Suffix Tree from disk. That's finally working fine, but now the construction is extremely slow, and I don't want to mess with the Ukkonen's algorithm (linear construction) right now. So, I…
juliomalegria
  • 24,229
  • 14
  • 73
  • 89
0
votes
1 answer

What should I read to understand suffix trees?

I've come to understand that suffix trees are excellent and useful structures for a multitude of string related tasks, and I would like to learn more about them. Can anyone suggest a good starting point for UNDERSTANDING these things? That is, I…
Svein Bringsli
  • 5,640
  • 7
  • 41
  • 73
0
votes
1 answer

Practical implementation of suffix array

Looking for a practical implementation of suffix arrays, I came across this paper. It outlines a O(n (log n * log n)) approach, where n is the length of the string. While there are faster algorithms available, IMO, none is suitable in a programming…
Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219
0
votes
0 answers

Finding FIRST occurrence of a substring using suffix trees

A suffix tree is an efficient data structure containing all suffixes of a given string. Suffix trees support operations such as checking if a given substring exists in the string and returning all occurrences of a substring. I was wondering if it is…
Shaharg
  • 971
  • 1
  • 11
  • 26
0
votes
0 answers

How to solve longest already present substring in O(n)?

Given a string a I need to find for every position i in a the length of the longest substring b such that it starts in position i and was already present in a, which means that there exists i'
quicker
  • 1
  • 1
0
votes
0 answers

Data structure for determining all strings that contain a given substring

Let's suppose I have a dynamic list of strings, and I have a substring s. What data structure would be best for determining all possible strings in my list that contain the substring s? I was thinking of using a suffix tree/array but those don't…
0
votes
0 answers

Serialize SuffuxTree python

I am using the suffix-tree library: https://pypi.org/project/suffix-tree/ tree = Tree() for item_id, item in tqdm.tqdm(enumerate(items)): tree.add(item_id, item.lower()) I want to save a tree into a file pickle.dump(tree, open('test.pkl',…
Not Found
  • 11
  • 2
0
votes
1 answer

Debugging a pattern-matching algorithm

The user provides a text file to be searched and a pattern to search for. The program builds a suffix tree and uses it to find all occurrences of the pattern in text, then prints their indexes. class Node: def __init__(self, start, substr): …
Tsidia
  • 3
  • 2
0
votes
0 answers

Suffix Tree - All common substrings

The problem is as following: Given 2 strings X and Y, I want to find the all (longest) common substrings, hence all substrings that appear in X and in Y and are maximal. for instance - if X = gttcatwg, Y = twgacgtt. return gtt and twg, not…
0
votes
0 answers

How to Create Suffix Tree From String?

I want to create a suffix tree from a given string. This is what I have came up with until now. Although some of the nodes are correctly added, some are missing. I suspect that my add_node_list is not working correctly but I can't find the reason…
Steven
  • 3
  • 2
0
votes
1 answer

Heftiest repeated substring

I am looking for naming/literature/implementations for a variation on the longest repeated substring problem. In the cited problem you find the longest (consecutive) substring with at least 2 (non-overlapping) repetitions: max len(s) | rep(s) > 1 In…
o17t H1H' S'k
  • 2,541
  • 5
  • 31
  • 52
0
votes
2 answers

Algorithm to find all duplicate sequences of tokens in a long string

Let's say I have a really long string consists of 10^6 tokens (for simplicity, token is a space-separated word, so this string is splitted to list of tokens) now I need to find all possible duplicated sequences and the start of the duplication…
Izik
  • 746
  • 1
  • 9
  • 25
0
votes
2 answers

Java Suffix Trie exceeding heap space

I am implementing a suffix trie (this is different from a suffix tree) that stores the characters suffixes of strings as nodes in a tree structure where a string is made up by following traversing the tree until you hit a '$' or you hit the end of…
Jonno_FTW
  • 8,601
  • 7
  • 58
  • 90
0
votes
1 answer

linear time algorithm for finding most frequent m-letter substring in a string

Suppose we have a n letter string and we are searching for most repeated m letter substring (1=
sonia
  • 25
  • 4