0

I'd like to store some data associated to words composed of simple ascii letters (a-Z) The goal is to retrieve very quickly the data associated to a word in a future parsing.

I though about the following structure:

struct Foo {
  Foo *letter[26];
  void *data;
};

Thus, it is possible to go down through the "letter tree" while parsing a word in a string and get the associated data.

"foo" => ['f' node] -> ['o' node] -> ['o' node]

The problem is the size of the entire tree if I got many words.

Is there a way to reduce the size of the tree without losing performances?

Guid
  • 2,137
  • 2
  • 20
  • 33
  • -1 from me, it's impossible to talk about performance improvement without seeing the actuall data sets used. – Šimon Tóth Dec 07 '12 at 15:21
  • 2
    Is your goal similar to the general uses of a [Radix Tree](http://en.wikipedia.org/wiki/Radix_tree) ? – WhozCraig Dec 07 '12 at 15:22
  • Notice the final parts of the trees for "foo" and "boo" are the same. By reusing the common endings you reduce a lot the size of the trees :) – pmg Dec 07 '12 at 15:23
  • Take a look at [Adding word to Trie structure](http://stackoverflow.com/questions/13674617/adding-word-to-trie-structure-dictionary/). The memory allocation there is a little more complex (it allocates 27 pointers instead of using 26 pointers in a array in the structure) and it stores the word in lieu of your `data` field, but otherwise, it seems very similar. – Jonathan Leffler Dec 07 '12 at 15:29
  • yes, a Radix Tree is great, thanks to all of you – Guid Dec 07 '12 at 15:55

2 Answers2

1

What you're describeing is called trie. A compact radix tree is more compact.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Henrik
  • 23,186
  • 6
  • 42
  • 92
0

Is there a reason you're using a tree instead of storing words in a hash table? These will tend to use a bit more memory, but will give you near constant time performance for table lookups.

upcrob
  • 21
  • 3
  • a hash table needs the entire word to get the associated data. With a radix tree, I can parse my word (in a string) letter by letter and stop as soon I don't find an appropriate node. – Guid Dec 07 '12 at 15:57