While trying to traverse directory tree efficiently, I tried a RX solution described here. While this solution works for small tree depth, it's not useable for big tree depth. The Default Scheduler creates too many threads, slowing down the tree traversal.
Here's the code I use :
public static void TestTreeTraversal()
{
Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null;
recurse = i => Observable.Return(i)
.Concat(i.GetDirInfos().ToObservable().SelectMany(d => recurse(d)))
.ObserveOn(Scheduler.Default);
var obs = recurse(new DirectoryInfo(@"C:\"));
var result = obs.ToEnumerable().ToList();
}
public static IEnumerable<DirectoryInfo> GetDirInfos(this DirectoryInfo dir)
{
IEnumerable<DirectoryInfo> dirs = null;
try
{
dirs = dir.EnumerateDirectories("*", SearchOption.TopDirectoryOnly);
}
catch (Exception)
{
yield break;
}
foreach (DirectoryInfo d in dirs)
yield return d;
}
If you remove ObserveOn(Scheduler.Default), the function works at the same speed than a mono-threaded recursive function. Using ObserveOn, it seems a thread is created each time SelectMany is called, slowing down the process dramatically.
Is there a way to control/limit the maximum number of threads the Scheduler can use at the same time?
Is there another way to write such a parallel tree traversal with Rx, without falling in this parallel-pitfall?