I am trying to implement a modified parallel depth-first search algorithm in Erlang (let's call it *dfs_mod*).
All I want to get is all the 'dead-end paths' which are basically the paths that are returned when *dfs_mod* visits a vertex without neighbours or a vertex with neighbours which were already visited. I save each path to ets_table1
if my custom function fun1(Path)
returns true
and to ets_table2
if fun1(Path)
returns false
(I need to filter the resulting 'dead-end' paths with some customer filter).
I have implemented a sequential version of this algorithm and for some strange reason it performs better than the parallel one.
The idea behind the parallel implementation is simple:
- visit a
Vertex
from[Vertex|Other_vertices] = Unvisited_neighbours
, - add this
Vertex
to the current path; - send
{self(), wait}
to the 'collector' process; - run *dfs_mod* for
Unvisited_neighbours
of the currentVertex
in a new process; - continue running *dfs_mod* with the rest of the provided vertices (
Other_vertices
); - when there are no more vertices to visit - send
{self(), done}
to the collector process and terminate;
So, basically each time I visit a vertex with unvisited neighbours I spawn a new depth-first search process and then continue with the other vertices.
Right after spawning a first *dfs_mod* process I start to collect all {Pid, wait}
and {Pid, done}
messages (wait
message is to keep the collector waiting for all the done
messages). In N milliseconds after waiting the collector function returns ok
.
For some reason, this parallel implementation runs from 8 to 160 seconds while the sequential version runs just 4 seconds (the testing was done on a fully-connected digraph with 5 vertices on a machine with Intel i5 processor).
Here are my thoughts on such a poor performance:
- I pass the digraph
Graph
to each new process which runs *dfs_mod*. Maybe doingdigraph:out_neighbours(Graph)
against one digraph from many processes causes this slowness? - I accumulate the current path in a list and pass it to each new spawned *dfs_mod* process, maybe passing so many lists is the problem?
- I use an ETS table to save a path each time I visit a new vertex and add it to the path. The ETS properties are
([bag, public,{write_concurrency, true})
, but maybe I am doing something wrong? - each time I visit a new vertex and add it to the path, I check a path with a custom function
fun1()
(it basically checks if the path has vertices labeled with letter "n" occurring before vertices with "m" and returnstrue/false
depending on the result). Maybe thisfun1()
slows things down? - I have tried to run *dfs_mod* without collecting
done
andwait
messages, buthtop
shows a lot of Erlang activity for quite a long time after *dfs_mod* returnsok
in the shell, so I do not think that the active message passing slows things down.
How can I make my parallel dfs_mod run faster than its sequential counterpart?
Edit: when I run the parallel *dfs_mod*, pman
shows no processes at all, although htop
shows that all 4 CPU threads are busy.