nested sequences to branching/tree data structure

Question

I'm not sure if this is an easy problem to solve and I am just missing something obvious, but I have been banging my head against it for sometime. I am trying to express tree divergence using lists. This is so I can easily specify my dataset easily inline using simple primitives, not worry about order, and build the tree from a disperate set of lists later.

So I have some lists like this:

 a = ["foo", "bar", "qux"]
 b = ["foo", "bar", "baz"]
 c = ["qux", "bar", "qux"]

I would like to have a function, that would take a sequence of these lists and express a tree like so:

myfunc :: [[a]] -> MyTree a

(root) -> foo -> bar -> [baz, qux]
       -> qux -> bar -> qux

An ideal solution would be able to take sequences of varying lengths, i.e:

a = ["foo"; "bar"; "qux"]
b = ["foo"; "bar"; "baz"; "quux"]
== 
(root) -> foo -> bar -> [qux, baz -> quux]

Are there any textbook examples or algorithms that can help me with this? seems like it can be solved elegantly, but all my stabs at it look absolutely horrible!

Please feel free to post a solution in any functional language, I will translate it as appropriate.

Thanks!

Note - the data structure you are wanting to build is a _Trie_ - see http://en.wikipedia.org/wiki/Trie — stephen tetley, Jul 27 '13 at 06:22

score 5 · Accepted Answer · answered Jul 26 '13 at 22:48

The way I approached this problem was to use a Forest to represent your type and then make a Forest a Monoid, where mappending two Forests together joins their common ancestors. The rest is just coming up with a suitable Show instance:

import Data.List (sort, groupBy)
import Data.Ord (comparing)
import Data.Foldable (foldMap)
import Data.Function (on)
import Data.Monoid

data Tree a = Node
    { value :: a
    , children :: Forest a
    } deriving (Eq, Ord)

instance (Show a) => Show (Tree a) where
    show (Node a f@(Forest ts0)) = case ts0 of
        []  -> show a
        [t] -> show a ++ " -> " ++ show t
        _   -> show a ++ " -> " ++ show f

data Forest a = Forest [Tree a] deriving (Eq, Ord)

instance (Show a) => Show (Forest a) where
    show (Forest ts0) = case ts0 of
        []  -> "[]"
        [t] -> show t
        ts  -> show ts

instance (Ord a) => Monoid (Forest a) where
    mempty = Forest []
    mappend (Forest tsL) (Forest tsR) =
          Forest
        . map (\ts -> Node (value $ head ts) (foldMap children ts))
        . groupBy ((==) `on` value)
        . sort
        $ tsL ++ tsR

fromList :: [a] -> Forest a
fromList = foldr cons nil
  where
    cons a as = Forest [Node a as]
    nil = Forest []

Here's some example usage:

>>> let a = fromList ["foo", "bar", "qux"]
>>> let b = fromList ["foo", "bar", "baz", "quux"]
>>> a
"foo" -> "bar" -> "qux"
>>> b
"foo" -> "bar" -> "baz" -> "quux"
>>> a <> b
"foo" -> "bar" -> ["baz" -> "quux","qux"]
>>> a <> a
"foo" -> "bar" -> "qux"

So your myFunc would become:

myFunc :: [[a]] -> Forest a
myFunc = foldMap fromList

Thank you @gabriel-gonzalez an excellent answer, very elegant. — thedajaw, Jul 26 '13 at 22:58

score 3 · Answer 2 · answered Jul 26 '13 at 23:21

I came up with a solution that is very similar to Gabriel's, but my data representation uses a Map so that I can load off most of the work to Data.Map.unionWith.

import Data.Map (Map, empty, singleton, unionWith, assocs)
import Data.Monoid

type Path a = [a]
data Tree a = Tree {leaf :: Bool, childs :: Map a (Tree a)} deriving Show

The boolean flag in the trees marks whether this node can be the end of a path. The avalues are hidden inside the childs Map. To warm up, let's define how to convert a single path into a tree.

root :: Tree a
root = Tree True empty

cons :: a -> Tree a -> Tree a
cons node tree = Tree False (singleton node tree)

follow :: Path a -> Tree a
follow = foldr cons root

The follow function is called fromList in Gabriel's code. We can also enumerate all paths that are contained in a tree.

paths :: Tree a -> [Path a]
paths (Tree leaf childs) =
  (if leaf then [[]] else []) ++
  [ node : path | (node, tree) <- assocs childs, path <- paths tree ]

The questions essentially asks for an inverse of this paths function. Using unionWith, we can define the monoid structure of trees easily.

instance Ord a => Monoid (Tree a) where
  mempty = Tree False empty
  mappend (Tree leaf1 childs1) (Tree leaf2 childs2) = Tree leaf childs where
    leaf = leaf1 || leaf2
    childs = unionWith mappend childs1 childs2

Now to convert a list of paths to a tree, we just use mconcat and follow.

unpaths :: Ord a => [Path a] -> Tree a
unpaths = mconcat . map follow

Here is a test case using the paths from the question.

a, b, c, d :: Path String

a = ["foo", "bar", "qux"]
b = ["foo", "bar", "baz"]
c = ["qux", "bar", "qux"]
d = ["foo", "bar", "baz", "quux"]

-- test is True
test = (paths . unpaths) [a, b, c, d] == [b, d, a, c]

We get the same paths back that we stored in the tree, but as an ordered list.

I have to say this solution is pretty awesome as well. Thanks! — thedajaw, Jul 27 '13 at 01:15
If leaf then it should not have any child and this can be represented using sum type : `data Tree a = Leaf | Children Map a (Tree a) deriving Show` — Ankur, Jul 27 '13 at 06:39
@Ankur That wouldn't work if some of the paths end early. For example, in `unpaths [[1], [1, 2]]` the node labeled with `1` is both a leaf (for the first path) and it has children (for the second path). So this datatype is not really a tree, more like an infinite state automaton. — Toxaris, Jul 27 '13 at 09:45

score 1 · Answer 3 · answered Jul 26 '13 at 23:37

type TreeNode<'T> = 
  | Node of 'T * Tree<'T>
and Tree<'T> = TreeNode<'T> list

module Tree =
  let rec ofList = function
    | [] -> []
    | x::xs -> [Node(x, ofList xs)]

  let rec merge xs tree =
    match (tree, xs) with
    | _, [] -> tree
    | [], _ -> ofList xs
    | nodes, x::xs ->
      let matching, nonMatching = nodes |> List.partition (fun (Node(y, _)) -> y = x)
      match matching with
      | [Node(_, subtree)] -> Node(x, merge xs subtree) :: nonMatching
      | _ -> Node(x, ofList xs)::nodes

Tree.ofList ["foo"; "bar"; "qux"]
|> Tree.merge ["foo"; "bar"; "baz"]
|> Tree.merge ["qux"; "bar"; "qux"]

> val it : TreeNode<string> list =
  [Node ("qux",[Node ("bar",[Node ("qux",[])])]);
   Node ("foo",[Node ("bar",[Node ("baz",[]); Node ("qux",[])])])]

score 0 · Answer 4 · answered Jul 27 '13 at 04:54

A clojure version, using hashmaps:

(defn merge-to-tree
  [& vecs]
  (let [layer (group-by first vecs)]
    (into {} (map (fn [[k v]]
                    (when k
                      [k (apply merge-to-tree (map rest v))]))
                  layer))))

Here I am using group-by to see when multiple vector elements should be represented by a single item in the output structure. (into {} (map (fn [[k v]] ...) m)) is a standard idiom for destructuring hash entries, performing some operations, then reconstructing a hash from the result. The recursive call on the values (apply merge-to-tree (map rest v)) constructs the various branches below that layer of the tree structure (map rest, because the full input is preserved by group-by, and the first element has already been used as the lookup key).

I welcome other suggestions / improvements. Example usage:

user> (merge-to-tree ["foo" "bar" "qux"])
{"foo" {"bar" {"qux" {}}}}

user> (merge-to-tree ["foo" "bar" "qux"] ["foo" "bar" "baz"] ["qux" "bar" "qux"])
{"foo" {"bar" {"qux" {}, "baz" {}}}, "qux" {"bar" {"qux" {}}}}

user> (merge-to-tree ["foo" "bar" "qux"] ["foo" "bar" "baz" "quux"])
{"foo" {"bar" {"qux" {}, "baz" {"quux" {}}}}}

nested sequences to branching/tree data structure

4 Answers4