3

I just want to double check the total space that a Trie data structure could have in the worst case. I thought it would be O(N*K) where N is total number of nodes and K is the size of the alphabet (points to other tries), but people keep telling me it's O(K^L) where where K is the size of the alphabet and L is the average word length, but do those null pointers use up memory space in Java ? for example, if one of the node only has lets say 3 branches/points out of total size K. Does it use K space ? or just 3 ? The following is Trie implementation in Java

class Trie {
     private Trie [] tries;

     public Trie () {
          // A size 256 array of Trie, and they are all null
          this.tries = new Trie[256]; // K = 256;
     }
}
peter
  • 8,333
  • 17
  • 71
  • 94
  • I have never heard of a trie using O(N^K) space nor do I see any reason for it to do so, regardless of null pointers taking up space. – Shashank Feb 27 '15 at 21:31
  • Possibly you mean a^m (where a is the size of the alphabet and m is the average word length) - see the comments under [this answer](http://stackoverflow.com/a/2719123/699224) – DNA Feb 27 '15 at 21:34
  • @Shashank sorry i mean K^N – peter Feb 27 '15 at 21:35
  • Well it could be K^L where K is alphabet size and L is the average word length as DNA spoke of. But not K^N because N is the number of total nodes and far exceeds the average word length. – Shashank Feb 27 '15 at 21:39
  • @Shashank you are right. Let me correct this. – peter Feb 27 '15 at 21:41

2 Answers2

7

If the memory footprint of a single node is K references, and the trie has N nodes, then obviously its space complexity is O(N*K). This accounts for the fact that null pointers do occupy their space. Actually whether an array entry is null or any other value doesn't change anything in terms of memory consumption.

O(K^L) is a completely different measure because it uses different parameters. Basically K^L is the estimate on the number of nodes in a densely populated trie, whereas in O(N*K) the number of nodes is explicitly given.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
0

I'd like to give a little more details on Marko's answer.

The memory consumed by each node of the trie is the same, being null or not. An array store only pointers, and it has the total space since initialization. Each node though will have it's own memory, but that is an implementation detail and we are talking about asymptotic analysis, so we don't consider the memory occupied by a node's implementation.

O(N*K) is the number of nodes in a full trie (for each node N there are K children). That is correct, but you are considering the number of nodes and you don't know that upfront. If you know that, you can add up the memory used for each node (implementation detail) and you will be calculating the exact amount of memory used by your trie. The Big-O notation may not even make sense in this case (?).

What you can know is L (the average length of keys) and K (the size of the alphabet), so you use these to analyze the complexity. If you do the math, you will find out K^L actually accounts only for the last level of the trie (take K=2 and L=3, that would give a binary tree of height 4, and 2^3 = 8 nodes at the last level and 15 nodes total). The last level doesn't give the total number of nodes in the trie, but we are talking about asymptotic analysis and only the significant bits matter. So you have O(K^L).

jpenna
  • 8,426
  • 5
  • 28
  • 36