0

I have a collection of k (k>2) sequences. I need an algorithm that detects repeats (i.e. the appearance of the same string twice) in every sequence, where that repeated string is the same in all sequences. Also, I am wondering if there are any shortcomings if we try to put limitations in spaces between two string's appearances in every sequence. Only suffix-tree solutions.

Any tip will be much appreciated.

EDIT:

Example:

S1 = AATTAATTCGCG
S2 = GGAATTAATTCC
S3 = GAAATTAATTGA
Result= AATT
  • Why 'Only suffix-tree solutions'? Is this homework? (I would try to use a HashSet) – MrSmith42 May 07 '21 at 09:22
  • @MrSmith42 It is a task from Dan Gusfield book of Bioinformatics. I work on a bioinformatics project and I need practice. Unfortunately Dan Gusfield book does not offer solution manual. – George Verouchis May 07 '21 at 09:24
  • 1
    I don't see `AATT` twice in `S3`? I'm not sure I understand the question, but probably you can solve it by constructing the suffix tree for `S1#S2$S3` and then computing a synthesized attribute bottom-up. – David Eisenstat May 07 '21 at 13:04

0 Answers0