1

I was wondering how to write a proof that the number of branches or root edges in a suffix tree are equal to the size of alphabet of the string S. Say if we have S = {aaabaac}, alphabet={a,b,c}, size of alphabet =3, then the root edges (or branches starting from the root) are only going be exactly 3 i.e. a,b and c. Or can this be proven by definition? Am not sure!

perfecto
  • 63
  • 5
  • This depends on the exact definition you are using. Usually, you would assume that the negation of the assumption. Then you would show that such a tree can not exist, hence the assumption is false, and the original assumption is true. So in your case, assume that the tree has *not* exactly three root edges, and show that this can not be true. – Polygnome May 15 '16 at 12:38
  • Thanks for the initial direction however how do I exactly counter the that in a proof? this is not homework whatsoever but just to understand whatsoever how that can be done. Thus how do I prove that if it has for instance in this case it has 4 root edges then not a valid suffix tree? – perfecto May 16 '16 at 22:40

1 Answers1

0

This actually isn't necessarily true. There are two factors you need to consider:

  1. Suffix trees include an extra end-of-string marker (often denoted $) that's used to ensure that all suffixes correspond to leaves in the tree. This means that you may have more children of the root than characters in the alphabet.

  2. The root will have one child for each distinct character that appears in the string, so it's entirely possible that the root will have fewer children than the size of the alphabet. For example, if your alphabet is {A, T, C, G}, then the suffix tree for AAAAAA$ will only have two children - one for $ and one for A.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • if you add $ as a character then size of alphabet is also increased by 1 i.e. in your example AAAAAA$ will give size of alphabet = 2 and this will then correspond to 2 root edges thus one starting with A and the other one starting with $ – perfecto May 21 '16 at 16:57
  • The alphabet of a string is the set of *possible* characters rather than the set of *used* characters. – templatetypedef May 21 '16 at 17:45
  • Ok great lets say _by definition_ the tree will be constructed without the $ - is it therefore possible to proof the statement that the tree will always have the exact number of root edges equal to the size of the alphabet? – perfecto May 21 '16 at 19:46
  • or possibly what will be the proof that the root edges will always be exactly _n+1_ where _n_ is the size of the alphabet since we have now established that. – perfecto May 21 '16 at 21:15