2

in a test WPF project I am trying to use TPL dataflow to enumerate through all subdirectories of a given parent directory and create a list of files with a particular file extension e.g. ".xlsx". I use 2 blocks, the first dirToFilesBlock, and the last, fileActionBlock.

To create the recursive effect of going through all subdirectories the first block has a link back to itself with the link predicate testing to see if the output item is a directory. This is an approach I found in a book on Asynchronous Programming. The second link is to the fileActionBlock which then adds the file to a list, based on the link predicate testing to see the file has the correct extension.

The problem I am having is after kicking things off with btnStart_Click, it never finishes. That is, we never reach below the await in the event handler to show the “Completed” message. I understand that I probably need to call dirToFilesBlock.Complete(), but I don’t know where in the code this should be and under what conditions? I can't call it after the initial post as it would stop to back link from giving subdirectories. I’ve tried doing things with the InputCount and OutputCount properties but didn’t get very far. I would like, if possible,to keep the structure of the dataflow as it stands as it means I can also update the UI with each new directory to be explored via the link back to give the user some feedback on progress.

I’m very new to TPL dataflow and any help is gratefully received.

Here is the code from the code behind file:

public partial class MainWindow : Window
{
    TransformManyBlock<string, string> dirToFilesBlock;
    ActionBlock<string> fileActionBlock;
    ObservableCollection<string> files;
    CancellationTokenSource cts;
    CancellationToken ct;
    public MainWindow()
    {
        InitializeComponent();

        files = new ObservableCollection<string>();

        lst.DataContext = files;

        cts = new CancellationTokenSource();
        ct = cts.Token;
    }

    private Task Start(string path)
    {
        var uiScheduler = TaskScheduler.FromCurrentSynchronizationContext();

        dirToFilesBlock = new TransformManyBlock<string, string>((Func<string, IEnumerable<string>>)(GetFileSystemItems), new ExecutionDataflowBlockOptions() { CancellationToken = ct });
        fileActionBlock = new ActionBlock<string>((Action<string>)ProcessFile, new ExecutionDataflowBlockOptions() {CancellationToken = ct, TaskScheduler = uiScheduler});

        // Order of LinkTo's important here!
        dirToFilesBlock.LinkTo(dirToFilesBlock, new DataflowLinkOptions() { PropagateCompletion = true }, IsDirectory);
        dirToFilesBlock.LinkTo(fileActionBlock, new DataflowLinkOptions() { PropagateCompletion = true }, IsRequiredDocType);

        // Kick off the recursion.
        dirToFilesBlock.Post(path);

        return Task.WhenAll(dirToFilesBlock.Completion, fileActionBlock.Completion);
    }

    private bool IsDirectory(string path)
    {

        return Directory.Exists(path);
    }


    private bool IsRequiredDocType(string fileName)
    {
        return System.IO.Path.GetExtension(fileName) == ".xlsx";
    }

    private IEnumerable<string> GetFilesInDirectory(string path)
    {
        // Check for cancellation with each new dir.
        ct.ThrowIfCancellationRequested();

        // Check in case of Dir access problems
        try
        {
            return Directory.EnumerateFileSystemEntries(path);
        }
        catch (Exception)
        {
            return Enumerable.Empty<string>();
        }
    }

    private IEnumerable<string> GetFileSystemItems(string dir)
    {
        return GetFilesInDirectory(dir);
    }

    private void ProcessFile(string fileName)
    {
        ct.ThrowIfCancellationRequested();

       files.Add(fileName);
    }

    private async void btnStart_Click(object sender, RoutedEventArgs e)
    {
        try
        {
            await Start(@"C:\");
            // Never gets here!!!
            MessageBox.Show("Completed");

        }
        catch (OperationCanceledException)
        {
            MessageBox.Show("Cancelled");

        }
        catch (Exception)
        {
            MessageBox.Show("Unknown err");
        }
        finally
        {
        }
    }

    private void btnCancel_Click(object sender, RoutedEventArgs e)
    {
        cts.Cancel();
    }
}

}

Cleve
  • 1,273
  • 1
  • 13
  • 26
  • Related: [How to mark a TPL dataflow cycle to complete?](https://stackoverflow.com/questions/26130168/how-to-mark-a-tpl-dataflow-cycle-to-complete) – Theodor Zoulias Jun 22 '20 at 11:11

1 Answers1

2

Even though this is an old question, handling completion in a dataflow loop can still be an issue.

In your case you can have the TransfomBlock keep a count of the items still in flight. This indicates that the block is busy processing any number of items. Then you'll only call Complete() when the block is not busy and both of it's buffers are empty. You can find more information on handling completion in a post I wrote Finding Completion in a Complex Flow: Feedback Loops

public partial class MainWindow : Window {

        TransformManyBlock<string, string> dirToFilesBlock;
        ActionBlock<string> fileActionBlock;
        ObservableCollection<string> files;
        CancellationTokenSource cts;
        CancellationToken ct;
        public MainWindow() {
            InitializeComponent();

            files = new ObservableCollection<string>();

            lst.DataContext = files;

            cts = new CancellationTokenSource();
            ct = cts.Token;
        }

        private async Task Start(string path) {
            var uiScheduler = TaskScheduler.FromCurrentSynchronizationContext();

            dirToFilesBlock = new TransformManyBlock<string, string>((Func<string, IEnumerable<string>>)(GetFileSystemItems), new ExecutionDataflowBlockOptions() { CancellationToken = ct });
            fileActionBlock = new ActionBlock<string>((Action<string>)ProcessFile, new ExecutionDataflowBlockOptions() { CancellationToken = ct, TaskScheduler = uiScheduler });

            // Order of LinkTo's important here!
            dirToFilesBlock.LinkTo(dirToFilesBlock, new DataflowLinkOptions() { PropagateCompletion = true }, IsDirectory);
            dirToFilesBlock.LinkTo(fileActionBlock, new DataflowLinkOptions() { PropagateCompletion = true }, IsRequiredDocType);

            // Kick off the recursion.
            dirToFilesBlock.Post(path);

            await ProcessingIsComplete();
            dirToFilesBlock.Complete();
            await Task.WhenAll(dirToFilesBlock.Completion, fileActionBlock.Completion);
        }

        private async Task ProcessingIsComplete() {
            while (!ct.IsCancellationRequested && DirectoryToFilesBlockIsIdle()) {
                await Task.Delay(500);
            }
        }

        private bool DirectoryToFilesBlockIsIdle() {
            return dirToFilesBlock.InputCount == 0 &&
                dirToFilesBlock.OutputCount == 0 &&
                directoriesBeingProcessed <= 0;
        }

        private bool IsDirectory(string path) {
            return Directory.Exists(path);
        }


        private bool IsRequiredDocType(string fileName) {
            return System.IO.Path.GetExtension(fileName) == ".xlsx";
        }

        private int directoriesBeingProcessed = 0;

        private IEnumerable<string> GetFilesInDirectory(string path) {
            Interlocked.Increment(ref directoriesBeingProcessed)
            // Check for cancellation with each new dir.
            ct.ThrowIfCancellationRequested();

            // Check in case of Dir access problems
            try {
                return Directory.EnumerateFileSystemEntries(path);
            } catch (Exception) {
                return Enumerable.Empty<string>();
            } finally {
                Interlocked.Decrement(ref directoriesBeingProcessed);
            }
        }

        private IEnumerable<string> GetFileSystemItems(string dir) {
            return GetFilesInDirectory(dir);
        }

        private void ProcessFile(string fileName) {
            ct.ThrowIfCancellationRequested();

            files.Add(fileName);
        }

        private async void btnStart_Click(object sender, RoutedEventArgs e) {
            try {
                await Start(@"C:\");
                // Never gets here!!!
                MessageBox.Show("Completed");

            } catch (OperationCanceledException) {
                MessageBox.Show("Cancelled");

            } catch (Exception) {
                MessageBox.Show("Unknown err");
            } finally {
            }
        }

        private void btnCancel_Click(object sender, RoutedEventArgs e) {
            cts.Cancel();
        }
    }
JSteward
  • 6,833
  • 2
  • 21
  • 30
  • Isn't there a race condition in this technique? It considers a TransformBlock complete if: `handlingMessages == 0 && HandleMessageBlock.InputCount == 0 && HandleMessageBlock.OutputCount == 0` where the transform Func looks like: `Interlocked.Increment(ref handlingMessages); ... Interlocked.Decrement(ref handlingMessages);` I think it could prematurely say it's complete if the above runs right after a message is taken from the input queue, but before the transform delegate is invoked. In that case, all counts would be 0. – Hans Olav Norheim Apr 12 '20 at 01:45
  • @HansOlavNorheim Indeed, I can confirm: I tried this solution and was seeing inconsistent results. Once I introduced a downstream BroadcastBlock as in https://stackoverflow.com/a/32913752/1969177, the Interlocked call can be put into the initial ISourceBlock (which is the ultimate source of the outstanding recursion count) to call Complete on the BroadcastBlock, and worked empirically. – user1969177 Jun 21 '21 at 15:58
  • @HansOlavNorheim Great catch, I'm glad you guys figured something out, I put this code together long ago and have since moved on to the much better `Channels` library that makes these things much easier. But thanks again for pointing out the deficiencies in this old solution so that others will know what to expect – JSteward Jun 21 '21 at 17:22