I get an array of paths (combined from default and user settings) and need to perform a recursive search for some data files which can be hidden between tens of thousands of of files in any of these paths.
I do the recursive search with a RecursiveDirectoryIterator
but it is quite slow and the suggested alternative exec("find")
is even slower. To save time, I/O and processing power I'd like to do some preprocessing beforehand to avoid searching directory trees multiple times and compute the smallest common denominator of the given paths. I would appreciate any advice on how to do this.
The catch is that any of the given paths might not only be ancestors of others or just symlinked into each other but might be given as either realpaths or paths to a symlink. At least one may assume that there won't be any circling symlinks (although a check wouldn't be bad).
I need to implement this in PHP and I sketched out the follwing Code, which doesn't cover all cases yet.
// make all given paths absolute and resolve symlinks
$search_paths = array_map( function($path) {
return realpath( $path ) ?: $path;
}, $search_paths );
// remove all double entries
$search_paths = array_unique( $search_paths );
// sort by length of path, shortest first
usort($search_paths, function($a, $b) {
return strlen($a) - strlen($b);
});
// iterate over all paths but the last
for ( $i = 0; $i < count( $search_paths ) - 1; $i++ ) {
// iterate over all paths following the current
for ( $j = $i; $j < count( $search_paths ); $j++ ) {
if ( strpos ( $search_paths[$j], $search_paths[$i] ) === 0 ) {
// longer path starts with shorter one, thus it's a child. Nuke it!
unset( $search_paths[$j] );
}
}
}
Where this code falls short:
Imagine these paths in $search_paths
/e/f
/a/b/c/d
/e/f/g/d
with /e/f/g/d
being a symlink to /a/b/c/d
.
The code above would leave these two:
/e/f
/a/b/c/d
but searching /e/f
would actually be sufficient as it covers /a/b/c/d
via the symlink /e/f/g/d
. This might sound like an edge case but is actually quite likely in my situation.
Tricky, eh?
I'm pretty sure I'm not the only one with this problem but I couldn't find a solution using google. Maybe I just don't get the right wording to the problem.
Thanks for reading this far! :)