Questions tagged [suffix-tree]

A suffix tree is a data structure that stores all suffixes of a string. It is the basis for many fast algorithms on strings.

228 questions
1
vote
3 answers

Circular dependent structs in C/C++

Heres a simple question, I'm implementing suffix array but I'm stuck here: #define SIZE 150 struct node{ transition *next[SIZE]; //error here }; struct transition{ int left, right; node *suffix_link; }; This code won't compile,…
maurizzzio
  • 39
  • 4
1
vote
0 answers

finding all indexes of a keyword in a suffix tree

This is a visual graph of a suffix tree for the input text "mississippi". In this example, my keyword that I'm searching for is "si". I think I understand how to get the first index of "si" start at root node #1 first edge is "s", so we travel…
user85116
  • 4,422
  • 7
  • 35
  • 33
1
vote
2 answers

Implementations for Pattern/String mining using Suffix Arrays/Trees

I am trying to solve a pattern mining problem for strings and I think that suffix trees or arrays might be a good option to solve this problem. I will quickly outline the problem: I have a set strings of different lengths (quotation are just to mark…
Pearson
  • 109
  • 1
  • 1
  • 9
1
vote
1 answer

Find the longest common substring from two sentences

My question is how to find the longest common substring from two sentences. For example: sequence 1 = "there were a dozen eggs in the basket" sentence 2 = "mike ate a dozen eggs for breakfast" The longest common substring from sentence 1 and…
mk6man
  • 43
  • 4
1
vote
1 answer

Algorithm to find the words, that contain a sequence of words

I have a text file containing more than 100k words, each separated by a newline in the file. I want to implement a function that would return the list of words containing a given substring. For example: If the substring is "coat", then it would…
1
vote
1 answer

Find longest substring in two words with suffix tree

I need to solve problem - find longest substring in two words with suffix tree. I built suffix for first and secod word, but how can I find longest substring in two words? Could you suggest a possible algorithm for solving this problem?
QuickDzen
  • 247
  • 1
  • 11
1
vote
1 answer

Why is not there a suffix link between these two nodes in this string's suffix tree?

I am learning the Ukkonen's algorithm of how to generate a suffix tree from a given string. I tried one string "dedododeodo" in the visualization website http://brenden.github.io/ukkonen-animation/, one thing I do not fully understand is: why is not…
Dachuan Huang
  • 115
  • 1
  • 8
1
vote
1 answer

Suffix Tree check existence of P pattern before k position

I need to design an algorithm that given a T string of n length, after a process O(n), for every string P of m length and a k value between 1 to n, to checks in O(m) time, if P appears on T before k position, only using Suffix Tree. Unfortunately…
1
vote
0 answers

Implement Suffix Tree

Good afternoon. I am trying to rewrite the code from C ++ to python, but I get a key error: 0 in the last lines: for c in range (256): link [0] [c] = 1; I've looked at a sample for adding a value to a SortedDict: sd ['c'] = 3, however I can't figure…
mraklbrw
  • 23
  • 3
1
vote
1 answer

How to construct Suffix tree from LCP array and Suffix array

The title pretty much. I created a suffix array in O(n) time using the DC3 algorithm. I then created an LCP array using Kasai's algorithm in O(n) time. Now I need to create a suffix tree from the two arrays I have. How does one do that? I looked at…
1
vote
1 answer

Can someone explain when and how to extend a suffix tree?

I'm working on a php script which has to find the longest repeated substring. I found this Suffix-Tree thing. I'm trying to implement Ukkonnen's algorithm, but I can't get when and how to extend the tree. It's okay if i have new charachter which is…
Damien
  • 674
  • 5
  • 12
1
vote
1 answer

Construct a suffix tree of a concatination of a million words and query it with a test set to find the closest match and classify

The problem I'm trying to solve: I have a million words (multiple languages) and some classes that they classify into as my training corpora. Given the testing corpora of words (which is bound to increase in number over time) I want to get the…
1
vote
0 answers

How to get both key and values in suffixtree.substringdict

I am using suffixtree to retrive matched substring. The readme file contains an example as -- >>> import SuffixTree.SubstringDict >>> d = SubstringDict.SubstringDict() >>> d['foobar'] = 1 >>> d['barfoo'] = 2 >>> d['forget'] = 3 >>> d['oo'] [1,…
Ujjal Kumar Das
  • 191
  • 3
  • 15
1
vote
3 answers

Suffix Tree: Longest repeating substring implementation

I have implemented a suffix tree, which is not compressed. I wanted to know how to solve the problem of finding the longest repreating substring in a string. I know that we have to find the deepest internal node with two children, but how can be…
TimeToCodeTheRoad
  • 7,032
  • 16
  • 57
  • 70
1
vote
2 answers

Find all repeating non-overlapping substrings and cycles

I have a complex problem of string manipulation at hand. I have a string in which I will have cycles, as well as recurrences which I need to identify and list down. 'abcabcabcabcabcdkkabclilabcoabcdieabcdowabcdppabzabx' Following are the possible…
Nishutosh Sharma
  • 1,926
  • 2
  • 24
  • 39