Suffix-tree Detection of repeated string in sequences

Asked May 07 '21 at 09:10

Active May 12 '21 at 12:01

Viewed 118 times

I have a collection of k (k>2) sequences. I need an algorithm that detects repeats (i.e. the appearance of the same string twice) in every sequence, where that repeated string is the same in all sequences. Also, I am wondering if there are any shortcomings if we try to put limitations in spaces between two string's appearances in every sequence. Only suffix-tree solutions.

Any tip will be much appreciated.

EDIT:

Example:

S1 = AATTAATTCGCG
S2 = GGAATTAATTCC
S3 = GAAATTAATTGA
Result= AATT

edited May 07 '21 at 13:07

asked May 07 '21 at 09:10

George Verouchis

Why 'Only suffix-tree solutions'? Is this homework? (I would try to use a HashSet) – MrSmith42 May 07 '21 at 09:22
@MrSmith42 It is a task from Dan Gusfield book of Bioinformatics. I work on a bioinformatics project and I need practice. Unfortunately Dan Gusfield book does not offer solution manual. – George Verouchis May 07 '21 at 09:24
1

I don't see `AATT` twice in `S3`? I'm not sure I understand the question, but probably you can solve it by constructing the suffix tree for `S1#S2$S3` and then computing a synthesized attribute bottom-up. – David Eisenstat May 07 '21 at 13:04

Suffix-tree Detection of repeated string in sequences

0 Answers0