I want to reconstruct the incidence structure of a graph in Haskell, which is given by the output of a breadth first traversal of it. Explicitly, the output consists of a root vertex and a list of neighborhoods (a neighborhood is a list of vertices marked as new or old (= already visited)), where each neighborhood corresponds to the least vertex which has not been assigned to a neighborhood, yet.
In any imperative language, I would solve the problem by using a queue:
Input: root vertex r, list of neighborhoods L
(1) Put r into the empty queue Q
(2) if Q is empty then STOP
(3) extract the first vertex v of Q
(4) extract the first neighborhood N of L
(5) append the unvisited vertices of N to Q
(6) remove the markings (new/old) of the nodes of N and assign v to N
(7) goto (2)
I tried to implement this naive algorithm in Haskell (by using a list or by using Data.Sequence as queue), but ghci always runs out of memory. This should not happen, because although the input consists of 300MB data, 16GB RAM should clearly suffice.
Therefore the naive implementation seems to cause a memory leak. How would you implement this algorithm in Haskell?
Edit: Here are the (slightly simplified) data types, I use:
data Output = Out !Vertex ![[BFSNode]]
data Vertex = Vertex Integer SomeMoreComplexData
data BFSNode = New Vertex | Old Integer
data Graph = ![Vertex] ![(Integer,[Integer])]
The data type "Output" contains the already parsed BFS output consisting of the root vertex and the lists of neighborhoods. BFSNode corresponds to a node in the BFS tree which belongs to either a new vertex which is visited for the first time, or to an old vertex which already has been visited and which is therefore referred by its unique number. Note that the parsing process works fine and consumes very few memory.
My aim is to convert "Output" into the data type "Graph" which consists of the lists of vertices and of an incidence list.
Here is a simplified version of my implementation:
readTree :: [[BFSNode]] -> Seq Integer -> Graph
readTree [] _ = Graph [] []
readTree (nb:nbs) qs =
let (i :< qs') = viewl qs
newVs = fromList $! map nodeNr . filter isNew $ nb
(Graph vs adj) = readTree nbs $ qs' >< newVs
in Graph (map unNew (filter isNew nb) ++ vs) ((i,nub $ map nodeNr nb):adj)
"nbs" is the list of neighborhoods, "qs" is the queue. The function "nodeNr" extracts the unique identification number from a vertex, "isNew" tests whether a vertex is new, and "unNew" unpacks a new vertex from the data type "BFSNode".
Edit2: I think I localized the problem now. Maybe it has nothing to do with my implementation of the conversion process. My failure was to use the build in function "read" to read the data type "Output" from a file. I realized now that Haskell has problems with reading big files. Even if it were just about reading a list of integers, e.g.
main = do
txt <- readFile "test"
writeFile "test2" . show $ (read txt :: [Integer]) }
the program will run out of memory if the file "test" is big enough. I understand now, that it is no good idea to parse data in this way, since "read" will load all data into the memory before showing any output, but I still do not understand why it fills 16GB of RAM although the file amounts not even 500MB. Do you have any idea what is wrong with "read"? Does Haskell show the same behavior on your machines?
Edit3: Now I implemented a stream based parsing function "readOutput" which takes a String and returns the data type "Output". This function is lazy, so I immediately get an output when I call it. But when I compose it with my conversion function "readTree" (which is clearly tail-recursive) I get no output at all and the memory usage increases as usual. What am I doing wrong?
Edit4: The problem in Edit3 came from some strictifications which I removed now.