I have a file with the following format:
Y1DP480P T FDVII005 ID=000
Y1DPMS7M T Y1DP480P ID=000
Y1DPMS7M T Y1DP4860 ID=000
Y1DPMS7M T Y1ENDCYP ID=000
Y1DPMS6M T Y1DPMS7M ID=000
Y1DPMS5M T VPY1CM28 ID=000
Y1DPMS5M T Y1DPMS6M ID=000
Y1DPAS21 T Y1DPMS5M ID=000
Y1DPMS4M T FDRBC004 ID=000
Y1DPMS4M T FDYBL004 ID=000
etc. etc.
only the data in column 1-8 and 12-19 is used and can be thought of as:
node1 -> node2
node1 -> node3
node3 -> node5
node2 -> node4
node4 -> node5
node5 -> node7
I need an efficient way to map the path from a given start node to a given end node.
For example, if I want the path from node1 to node7, the function would return node1->node3, node3->node5, node5->node7.
Current approach:
I read the file into an array taking the first 19 characters as both the key and the value e.g.
$data[Y1DP480P T FDVII005] = 'Y1DP480P T FDVII005'
(I use the value as the key because the input file may contain duplicates as this filters them out - I don't think PHP has a 'set' data structure).
I have a recursive subroutine that finds the next 'n' dependants from a given node as follows:
(on entry, $path[] is an empty array, node data is in $data, the node to start the search from is $job and the depth of dependants is $depth)
function createPathFrom($data, $job, $depth) {
global $path, $maxDepth, $timeStart;
$job = trim($job);
// echo "Looking for $job\n";
if ( $depth > $maxDepth ) {return;} // Search depth exceeded
// if ( (microtime(true) - $timeStart) > 70 ) {return;} //Might not be needed as we have the same further down
// $depth += 1;
// Get the initial list of predecessors for this job.
// echo __FUNCTION__."New iteration at depth $depth for $job\n";
$dependents = array_filter($data, function($dataLine) use($job){
// preg_match('/'.JOB_SPLIT_MASK.'/', $dataLine, $result);
// $dependent = trim($result[1]);
$dependent = explode(" ", $dataLine)[0];
return ( $dependent == $job );
// return ( preg_match('/'.$job.'/', $dependent) );
});
if (count($dependents) == 0) {
return;
} else {
// print_r($predecessors);
$elapsedTime = microtime(true) - $timeStart;
// print $elapsedTime." : Searching ".count($dependents)." at depth ".$depth.NL;
$path = array_merge($path, $dependents);
foreach($dependents as $dependency) {
// preg_match('/'.JOB_SPLIT_MASK.'/', $dependency, $result);
// $dependent = trim($result[3]);
$dependent = explode(" ", $dependency)[2];
if ( (microtime(true) - $timeStart) > 85 ) {return;} // Let's get out if running out of time... (90s in HTTPD/conf)
createPathFrom($data, $dependent, $depth+1);
}
}
}
I have an almost identical function that established the predecessors for my end node called createPathTo
The time limits (70s & 85s and yes - one is definitely redundant) and the depth limit are to avoid my cgi-script timing out.
If I call both routines with enough 'depth', I can see if they connect, but there are a lot of dead-ends.
I think I'm doing a breadth-first search whereas I think I should be doing a depth-first search and throwing away the searches that don't reach my target node.
Question:
Giving a start node and an end node, is there en efficient search algorithm that will return the bare minimum of nodes to make the connection or some value indicating that no path was found?
This question follows on from Recursive function in PHP to find path between arbitrary nodes. I have the nodes leading to (and now from) my target node but now I want to trim it to just the path between 2 nodes.
Edit: I'm sure the answer is already here on SO, but I'm pretty new to PHP and these sorts of algorithms, so haven't been able to find one.