I've built a suffix tree in Java based on the site here http://marknelson.us/1996/08/01/suffix-trees/ but I've run into a problem. I can build a suffix tree fine but I can trying to build a set of all suffixes from the tree. I basically find all the 'end nodes' and return the string which is represented by that 'end node'
That algorithm works for a word like "bookkeeper"
├── (1) bookkeeper
├── (9) e
│ ├── (8) eper
│ ├── (10) per
│ └── (12) r
├── (6) k
│ ├── (7) eeper
│ └── (5) keeper
├── (3) o
│ ├── (4) kkeeper
│ └── (2) okkeeper
├── (11) per
└── (13) r
Suffixes:
[bookkeeper, ookkeeper, okkeeper, kkeeper, keeper, eeper, eper, er, per, r]
But it fails when I use something like "ATATATATATA"
├── (1) ATATATATATA
└── (2) TATATATATA
Suffixes:
[ATATATATATA, TATATATATA]
But the proper suffixes should be:
[A, ATA, ATATA, ATATATA, ATATATATA, ATATATATATA, TA, TATA, TATATA, TATATATA, TATATATATA]
I can find the answer by finding all suffixes of each 'end node' string but that doesn't seem like that right approach. Any other suggestions?
EDIT: Thanks izomorphius! Adding an END_CHAR to the original string helped a bunch.
├── (21) #
├── (19) A
│ ├── (20) #
│ └── (15) TA
│ ├── (16) #
│ └── (11) TA
│ ├── (12) #
│ └── (7) TA
│ ├── (8) #
│ └── (3) TA
│ ├── (4) #
│ └── (1) TA#
└── (17) TA
├── (18) #
└── (13) TA
├── (14) #
└── (9) TA
├── (10) #
└── (5) TA
├── (6) #
└── (2) TA#
Suffixes:
[A, ATA, ATATA, ATATATA, ATATATATA, ATATATATATA, TA, TATA, TATATA, TATATATA, TATATATATA]