3

I am opening n concurrent threads in my function:

List<string> _files = new List<string>();

public void Start()
{
    CancellationTokenSource _tokenSource = new CancellationTokenSource();
    var token = _tokenSource.Token;

    Task.Factory.StartNew(() =>
    {
        try
        {
            Parallel.ForEach(_files,
                new ParallelOptions
                {
                    MaxDegreeOfParallelism = 5 //limit number of parallel threads 
                },
                file =>
                {
                    if (token.IsCancellationRequested)
                        return;
                    //do work...
                });
        }
        catch (Exception)
        { }

    }, _tokenSource.Token).ContinueWith(
        t =>
        {
            //finish...
        }
    , TaskScheduler.FromCurrentSynchronizationContext() //to ContinueWith (update UI) from UI thread
    );
        }

After the threads opened i have noticed that it chooses random files from my list. is it possible to choose every time the first n element from my list ?

svick
  • 236,525
  • 50
  • 385
  • 514
user1860934
  • 417
  • 2
  • 9
  • 22

3 Answers3

4

To get the behavior you want you need to write a custom partitioner, The reason it looks "random" is right now it is batching out the file list in blocks so if your source list was

List<string> files = List<string> { "a", "b", "c", "d", "e", "f", "g", "h", "i" };

when it partitions it it may split it evenly like so (if Max was 3 threads):

  • Thread1's work list: "a", "b", "c"
  • Thread2's work list: "d", "e", "f"
  • Thread3's work list: "g", "h", "i"

So if you watched the files being processed it may look like

"a", "d", "g", "e", "b", "h", "c", "f", "i"

If you make a custom partitioner you can have it take one item at a time instead of a batch at a time to make the work list look like

  • Thread1's work list: "a", GetTheNextUnprocessedString()
  • Thread2's work list: "b", GetTheNextUnprocessedString()
  • Thread3's work list: "c", GetTheNextUnprocessedString()

If you are using .NET 4.5 you can use this factory like so:

Parallel.ForEach(Partitioner.Create(_files, EnumerablePartitionerOptions.NoBuffering),
                new ParallelOptions
                {
                    MaxDegreeOfParallelism = 5 //limit number of parallel threads 
                },
                (file, loopstate, index) =>
                {
                    if (token.IsCancellationRequested)
                        return;
                    //do work...
                });

If you are not using .NET 4.5, it is not a trivial task so I am not going to write it here for you. Read the MSDN article I linked at the top and you will be able to figure it out eventually.

What I would do is ask yourself "do I really need the files to be processed in order?" if you don't need them to be in order let it do its own ordering as the only thing you will likely do by enforcing a order is potentially slowing down the process.

svick
  • 236,525
  • 50
  • 385
  • 514
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • The only reason i want this files to be processed in order is because my DataGridView contain many files (200- 400) so in order to see the processing files i need to roll the list, maybe i can order my DataGridView instead and change this processing files to be at the top of my DataGridView ? – user1860934 Oct 31 '13 at 14:56
  • @user1860934 I think that is a better plan. It is very easy to set up sorting in a DataGridView. I would go a step further and have this code put the processed files stored in some form of `ObservableCollection` and have the DGV bind to that collection to decouple the loading of the data and the displaying of the data. – Scott Chamberlain Oct 31 '13 at 15:15
3

Just don't rely on Parallel.ForEach if it's important that work items be started in a particular order; as others have said, you can configure it as needed, but it's not easy.

The much easier option is to just create 5 different tasks that will process the items. It doesn't have the ability to dynamically add/remove workers as needed, but you appear to not be leveraging that very heavily anyway.

Just create a BlockingCollection and 5 tasks that take items from it:

var queue = new BlockingCollection<string>();
int workers = 5;
CancellationTokenSource cts = new CancellationTokenSource();
var tasks = new List<Task>();

for (int i = 0; i < workers; i++)
{
    tasks.Add(Task.Run(() =>
    {
        foreach (var item in queue.GetConsumingEnumerable())
        {
            cts.Token.ThrowIfCancellationRequested();

            DoWork(item);
        }
    }, cts.Token));
}

//throw this into a new task if adding the items will take too long
foreach (var item in data)
    queue.Add(item);
queue.CompleteAdding();

Task.WhenAll(tasks).ContinueWith(t =>
{
    //do completion stuff
});
Servy
  • 202,030
  • 26
  • 332
  • 449
0

Of course the files are randomly selected, that's the whole point of parallel.foreach. If you go parallel, the 5 threads you specified will use the input as it's decided by the data partionier.

But if you really want to maintain the order, check the OrderablePartitioner you can specify for the parallel.foreach. -> http://msdn.microsoft.com/en-us/library/dd989583.aspx But of course this will decrease performance, but it allows you to specify how the partitions are created for the threads.

Martin Moser
  • 6,219
  • 1
  • 27
  • 41