1

When running the algorithm on the string 'AEKEAAEKEAAEKEA$' looking for the longest substring with at least 3 occurences all the nodes in the suffix tree have maximum 2 branches, how can that be?

The correct result should be the substring 'AEKEA'.

You can easily see the tree in the online suffix tree builder

I just followed the Wikipedi description:

"The problem of finding the longest substring with at least k occurrences can be found by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least k descendants"

What am I missing here?

Thank you.

Levon
  • 138,105
  • 33
  • 200
  • 191
kukit
  • 307
  • 1
  • 3
  • 8

1 Answers1

3

I don't think that website is correct. When I run 'AEKEAAEKEAAEKEA' through my suffix tree, I get the following tree.

└── (0)
    ├── (27) $
    ├── (6) A
    │   ├── (26) $
    │   ├── (16) AEKEA
    │   │   ├── (17) $
    │   │   └── (7) AEKEA$
    │   └── (18) EKEA
    │       ├── (19) $
    │       └── (8) AEKEA
    │           ├── (9) $
    │           └── (1) AEKEA$
    ├── (4) E
    │   ├── (24) A
    │   │   ├── (25) $
    │   │   └── (14) AEKEA
    │   │       ├── (15) $
    │   │       └── (5) AEKEA$
    │   └── (20) KEA
    │       ├── (21) $
    │       └── (10) AEKEA
    │           ├── (11) $
    │           └── (2) AEKEA$
    └── (22) KEA
        ├── (23) $
        └── (12) AEKEA
            ├── (13) $
            └── (3) AEKEA$

As you can see from this branch, you've found your longest substring with 3 occurences.

└── (0)
    ├── (27) $
    ├── (6) A
    │   ├── (26) $
    │   ├── (16) AEKEA
    │   │   ├── (17) $
    │   │   └── (7) AEKEA$
    │   └── (18) EKEA
    │       ├── (19) $
    │       └── (8) AEKEA
    │           ├── (9) $
    │           └── (1) AEKEA$
Justin
  • 4,196
  • 4
  • 24
  • 48
  • Thank you, but I'm a little confused, in your tree too the max number of branches of each node is 2. – kukit Jun 08 '12 at 13:23
  • Node 0 has 4 branches and node 6 has 3 branches. – Justin Jun 08 '12 at 13:26
  • Yes, sorry, you are right, but node 0 means "" and node6 means "A" and there's no node that means "AEKEA" and has more than 2 branches – kukit Jun 08 '12 at 13:30
  • 2
    I think I got it! is said "leaf descendants" (not only the direct ones) and node 18 has 3 leaf descendants, you can verfiy this with the 'A' substring (6 occurrences): it has leaf decendants 27,26,17,7,19,9,1 Thanks for the help! – kukit Jun 08 '12 at 13:56