3

Consider the following tree structure in a Clojure code:

(def tree [7 9 [7 5 3 [4 6 9] 9 3] 1 [2 7 9 9]])

The paths to - for instance - all even numbers in the tree would be:

[[2 3 0] [2 3 1] [4 0]]

This is a list of lists. Each 'inner' list represents an absolute path from the root of the tree to the leaves of interest.

I'm looking now for a data structure to represent such a result without redundancy. As you can see, for instance the fragment of [2 3] is repeated in two entries. I came up with a nested hash-map, but maybe there's something simpler:

{2 {3 {0 true 1 true}
 4 {0 true}}
Anton Harald
  • 5,772
  • 4
  • 27
  • 61

2 Answers2

3

I believe that DAWG is overkill for your problem. Suffixes of your paths are barely going to be shared. So usage of trie should be enough (this is actually your nested hash map approach). Also it's pretty easy to generate it in clojure.

Community
  • 1
  • 1
OlegTheCat
  • 4,443
  • 16
  • 24
1

I think you could use a "deterministic acyclic finite state automaton (DAFSA) also called a directed acyclic word graph (DAWG)".

In your data, all the paths consist a set of strings (or words). Each path to a leaf would represent a path to an even number.

Piotrek Bzdyl
  • 12,965
  • 1
  • 31
  • 49
  • Thanks for the link. The article is very general though. Could you maybe provide an example in Clojure? Maybe highlighting the advantage over the nested hash-map approach. The article calls the paths "strings" - this is interesting. It reminded me of regular expressions: The above could be expressed as "(23[01]|40)", however I'm not sure if that would be a practical implementation. – Anton Harald Apr 24 '16 at 20:00
  • Unfortunately, I don't have an example, I just thought this solution is worth mentioning. I guess simpler solutions (like one provided by OlegTheCat) will be enough for you. DAWGs are used to represent the whole dictionary of strings (words from some alphabet, in your case ints as indices in your paths) in a very compact way. If you want to represent a very big set of such paths, DAWG might be a good choice. – Piotrek Bzdyl Apr 24 '16 at 20:07