Self-referencing data structures

Question

I am trying to build a data structure for a graph.

The following code seems to work as intended:

data NodeRef = NodeRef String Int -- NodeRef name targetIndex

data Node = Node String Node -- Node name targetNode

ref0 = NodeRef "Zero" 1
ref1 = NodeRef "One" 2
ref2 = NodeRef "Two" 0
refs = [ref0, ref1, ref2]

deref :: [NodeRef] -> Node
deref refs = head allNodes
    where
        deref' (NodeRef name targetIndex) = Node name (allNodes !! targetIndex)
        allNodes = map deref' refs

showme :: Node -> Int -> [String]
showme _ 0                      = []
showme (Node name target) count = name : showme target (count - 1)

main :: IO()
main = print $ showme (deref refs) 100

However I have several questions:

How many "instances" of Node are created?

When I am running "showme" am I creating more nodes with each step?

What is the right way of building a data structure with circular references?

Thank you very much for your answer.

Generally you want to represent graph edges referentially. So rather than having a vertex actually *contain* its edges, you have the vertex be a simple string or something and the edge set be `[(String, String)]` or something equally basic. [`Data.Graph`](https://hackage.haskell.org/package/containers-0.6.7/docs/Data-Graph.html) uses an adjacency list representation, defining a graph as `Array Vertex [Vertex]` (That is, an array indexed by vertices whose elements are lists of vertices) — Silvio Mayolo, Aug 31 '23 at 23:48
Thank you very much for the comment. Please take my question as an academic one. I am trying to learn the basics of haskell. And the underlying question is how can I create a collection of "objects" (not sure if this is the right name) which may contain one another. — Chirmol Studio, Aug 31 '23 at 23:54
Related reading: [How do you represent a graph in Haskell?](https://stackoverflow.com/q/9732084/791604), [Is equality testing possible between two infinite data structures in Haskell?](https://stackoverflow.com/q/28243314/791604) — Daniel Wagner, Sep 01 '23 at 03:57

score 5 · Accepted Answer · answered Sep 01 '23 at 00:44

How many "instances" of Node are created?

Generally when you write a data constructor such as NodeRef or Node, this represents a computation that will allocate such a value—cf. new Node(…) in a typical imperative/OOP language. When you write a variable binding with let or where, mentioning that name again will refer to the same value.

A top-level binding, or a where binding that doesn’t depend on any of the parameters, is similarly shared, and also known as a “constant applicative form” (CAF). So ref0, ref1, ref2, and refs are all CAFs and any mention of e.g. ref0 will refer to the same NodeRef value.

On each invocation of deref, it allocates a list allNodes which contains as many Node values as there are NodeRef values in the input list refs—three in this case. The expression allNodes !! targetIndex refers to indices of the same shared allNodes list.

Finally, since deref returns head allNodes, only those nodes that are actually connected to the head node will remain reachable, and the others will be garbage-collected.

When I am running "showme" am I creating more nodes with each step?

showme allocates a number of list “cons” cells (:) equal to the count argument. Or, in case count is negative, it will continue decrementing until wrapping around the range of Int and back to 0—you can use a guard with <= instead, or pred instead of - 1 to use a checked decrement. It does not copy the nodes themselves, only manipulating references to them (and their fields).

What is the right way of building a data structure with circular references?

Your approach is correct here. If you want to use these circular data structures, you will need to be careful not to use naïve unbounded recursion when traversing them, so as to avoid nontermination. You also won’t be able to easily change this data structure—that is, construct an amended version of it—without losing the sharing. Therefore it’s common to just use the ID-based representation you have in NodeRef, which is an example of “observable sharing”, stored in a parent structure such as an IntMap. You might move to Node if you want to “freeze” the representation.

Knot-tying is more commonly used as a convenience, to avoid the need for the explicit mutation and manual sequencing used in imperative languages. As long as the dataflow isn’t circular, then you can just write the definition of a whole data structure and let the ordinary process of lazy evaluation fill it in.

Thank you so very much for your detailed answer. Unfortunately I don't have enough reputation to upvote. "You also won’t be able to easily change this data structure". Yes, the goal is immutability. I will accept as soon as the timer runs out. — Chirmol Studio, Sep 01 '23 at 02:08

Self-referencing data structures

1 Answers1