C# Ordered Threads plus Concurrency

Question

I'm building a pipeline cycle with threads. Cycles are Fetch, Decode, Execute, and Write Back. I coded a GUI button which triggers each cycle. The problem I'm having is that the output is all jumbled up when I want it to be in a specific order (F->D->X->W). In addition, I want these threads to run concurrently meaning I can't use locks with monitors. From what I know about monitors and locks, they make it so one thread runs at a time and that won't work for me. Here is a simple outline of my code:

public void nextInstruction() 
{
    // These all trigger at once
    // and run in parallel
    fetchEvent.Set();
    decodeEvent.Set();
    executeEvent.Set();
    wbEvent.Set();
}

public void fetch() 
{
    fetchEvent.waitone();
    // do stuff...
}

public void decode()
{
    decodEvent.waitone();
    // do stuff...
}

public void execute()
{
    exeEvent.waitone();
    // do stuff...
}

public void writeBack()
{
    wbEvent.waitone();
    // do stuff...
}

and current output:

F           D           X           W
F      lda $50

F      sta $2        
            D      lda #$10    
F      lda #$0       
            D      sta $1        
                        X      sta $0        
            D      lda $50
                                    W      sta $0        
                        X      lda #$10    
F      sta $0        
F      lda #$10    
            D      lda $50
                        X      sta $1        
                                    W      sta $0        
F      sta $1        
                                    W      sta $0        
                        X      lda $50
            D      sta $2        
            D      lda #$0       
                        X      lda $50
                                    W      sta $0

my desired output:

F           D           X           W
F      lda $10

F      sta $0        
            D      lda #$10    
F      lda #$11     
            D      sta $0        
                        X      lda #$10  
F      sta $0          
            D      lda $11     
                        X      sta $0    
                                    W      lda #$10      
F      lda #$10    
            D      sta $0
                        X      lda $11         
                                    W      sta $0        
F      sta $1        
            D      lda #$10 
                        X      sta $0 
                                    W      lda $11               
            D      sta $1
                        X      lda #$10
                                    W      sta $0

I have my print statements formatted this way. Makes it easier for me to see the ordering. Again each of these print statements--F, D, X, W--are each triggered by a distinct thread. Any input on how to achieve this?

I'm also open to using locks if there is a way to use them I'm unaware of.

Have you looked into [Parallel.Invoke()](http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.aspx) already? Calling this method with an array of actions executes those actions in parallel, but the method does not return **until all** actions have been completed. Design your actions/tasks in a way that you can invoke an output method for each action/task sequentially after Parallel.Invoke() returned. — , Nov 17 '13 at 03:53
@elgonzo No I haven't come across that yet. I'll check it out. — blutuu, Nov 17 '13 at 04:20
@blutuu What do you know about Monitor and Locks Do you know the diference between these two? — saeed, Nov 17 '13 at 05:03
@saeed Well a lock is used to ensure exclusive access to an object while a monitor handles that lock with waits a pulses. Correct? — blutuu, Nov 17 '13 at 05:05
@blutuu NO, Monitor : Provides a mechanism that synchronizes access to objects. — saeed, Nov 17 '13 at 05:11
@blutuu, @saeed: The `lock` statement in C# is just C# handling the Monitor mechanism for you :) ([Read here, for example](http://stackoverflow.com/questions/4978850/monitor-vs-lock)) (Similar like C#'s `using` is just a shorthand for try {...} finally { Dispose(); }) — , Nov 17 '13 at 05:11
@blutuu, it seems our small discussion in the comments about the workings of Parallel.Invoke() disappeared. Were you able to read it? — , Nov 17 '13 at 07:47
@elgonzo Yeah I was able to read it. I think an answer was deleted. — blutuu, Nov 17 '13 at 08:01

Roman Polunin · Answer 1 · 2013-11-17T04:41:25.680

1

Reason is that you set all events at once.

Once main thread wakes them up, the order of execution in which they are scheduled by Windows is undefined, and last thread may start sooner than the first.

Look at the producer-consumer collections, such as BlockingCollection: http://msdn.microsoft.com/en-us/library/dd267312(v=vs.110).aspx

These will allow you to have all four running at the same time, consuming product of previous one and sending output to the next one.

You describe the problem as series of sequential steps, triggered by each input element. In between the processing blocks, you might have objects of some other types and a different number of them, but this is still a producer-consumer pattern.

Producer-consumer in this case is applied to all subsequent pairs of your processors. They are actively running, waiting for input from predecessor and sending output to successor. Once you start thinking about how to manage resources to store those temporary outputs, you naturally come to the idea of queuing.

Draw the diagram of the process, create a single BlockingCollection for each data flow link, and give it some queue in constructor to ensure ordering (if needed). Also, specify limits on the number of items in each, that will be your "buffer size". Consumers will use "GetConsumingEnumerable" method and apply foreach on it - it will automatically block when there's no data in the queue from producer.

Once done, re-check each producer-consumer pair, to make sure that they run at the same speed, on average. If any consumer runs significantly faster, consider merging its code into the predecessor producer, because queuing is useless on that link.

edited Nov 17 '13 at 04:41

answered Nov 17 '13 at 04:02

Roman Polunin

53
3
12

Could you please provide an example. It would be interesting to see how to use a BlockingCollection to achieve ordered output of concurrently running tasks. (I am looking at the MSDN documentation of it, but all i can think about would also work *without* BlockingCollection) – Nov 17 '13 at 04:30
In your case, producer-consumer pattern seems to be applicable. At least judging by names of those operations. I'll add some text into the answer. – Roman Polunin Nov 17 '13 at 04:33
No, the OP specifically mentioned that the steps are to be executed concurrently, not sequentially (looks like a class/course assignment). Your approach seems to enforce sequential execution, which the OP doesn't want... – Nov 17 '13 at 04:43
It looks sequential, however at any given time all four threads will be busy doing work - but on different elements. Look up some YouTube video on how conveyors work at factories, that's where all these design patterns come from. – Roman Polunin Nov 17 '13 at 04:46
Okay, but how do you get the ordered output the OP wished, and how does it require a BlockedCollection? Also, the last action to finish would need to call CompleteAdding(), but which action would be the last to finish? For what it is worth, the actions could all finish in random order (as hinted at by the situation the OP describes). – Nov 17 '13 at 05:15
1

If you want to have complete ordering, use queue as a base collection underneath BlockingCollection. If you do not want ordering, use instances of ConcurrentBag instead of the those two - but BlockingCollection provides better programming model, try it. Definition of the "last action" in your case is when a consumer has read ALL entries from GetConsumingEnumerable() - e.g. "foreach" enumerator loop exits. As soon as consumer sees this, it can invoke "AddingCompleted" on the queue that goes to successor. – Roman Polunin Nov 17 '13 at 06:25
And no, BlockingCollection is not required. It is just a convenient wrapper to be used with an underlying queue, so that you benefit from using a clean and easy-to-learn API. And that code is also fast enough for vast majority of applications, especially a student's lab. – Roman Polunin Nov 17 '13 at 06:28
thanks for clearing it up. I was looking in the wrong corner, i suppose, by thinking about how the BlockingCollection is somehow involved in managing the threads. Sorry about the confusion... – Nov 17 '13 at 07:42

inquisitive · Answer 2 · 2013-11-17T04:37:12.340

1

try it this way,

public void Button_OnClick() {
    nextInstruction();
}

public void nextInstruction() 
{
    fetchEvent.Set();
}

public void fetch() 
{
    while(true) {
        fetchEvent.waitone();
        // do stuff...
        decodeEvent.Set();
    }
}

public void decode()
{
    while(true) {
        decodEvent.waitone();
        // do stuff...
        executeEvent.Set();
    }
}

edited Nov 17 '13 at 04:37

answered Nov 17 '13 at 04:09

inquisitive

3,549
2
21
47

I've tried this exact method yesterday, but was informed that it's not the way I should do it. Thanks again. – blutuu Nov 17 '13 at 04:19
@elgonzo, no, it is not running concurrently. in a pipeline, *different* instructions in their *different* stages run concurrently. not that the same instruction's all stages run concurrently. one instruction enters pipeline, then it goes *sequentially* through the stages. things is that multiple instruction can enter the pipeline concurrently. – inquisitive Nov 17 '13 at 04:29
@blutuu: the one who told you *not* to do it this way, did she told you **why** not? please see my previous comment on paralleling the stages of a pipeline. would the EDITED answer does any better? – inquisitive Nov 17 '13 at 04:32
I was told that each event should be triggered at once. My professor gave us an example where he used delegates to enforce order, although he only used fetch and decode to demonstrate. I would use delegates, but I don't know how to use them. – blutuu Nov 17 '13 at 04:42
@inquisitive, i think this assignment is about learning multi-threading, not about how a CPU works (although the example chosen might indeed be misleading) ;) – Nov 17 '13 at 05:25

score 1 · Accepted Answer · 2013-11-17T05:29:02.060

ATTENTION! The following will only work in .NET 4 or newer. Neither Parallel nor ConcurrentDictionary are available in older .NET versions.

With Parallel.Invoke(), you can execute your actions concurrently while taking adavantage of this method not returning until all actions have been completed.

Use a data type like a collection or dictionary where the actions can store their (output) results for later sequential processing.

In the example code below, i used a ConcurrentDictionary. The keys of the ConcurrentDictionary are the actions itself. If you want to process this dictionary somewhere else, it might not be a wise idea to use the actions themselves as keys. In such a case, you should rather implement a public enum (representing the actions) to be used as keys for the dictionary.

Since the actions run concurrently and might access the dictionary at exactly the same time, the thread-safe ConcurrentDictionary type hase been chosen. (An ordinary Dictionary is not thread-safe, and might thus cause sporadic, seemingly random errors.)

public class InstructionCycles
{
    private readonly ConcurrentDictionary<Action, string> _dictActionResults = new ConcurrentDictionary<Action, string>();

    private void fetch()
    {
        // do something and store the result in the dictionary
        _dictActionResults[fetch] = "FetchResult";
    }

    private void decode()
    {
        // do something and store the result in the dictionary
        _dictActionResults[decode] = "DecodeResult";
    }

    private void execute()
    {
        // do something and store the result in the dictionary
        _dictActionResults[execute] = "ExecuteResult";
    }

    private void writeBack()
    {
        // do something and store the result in the dictionary
        _dictActionResults[writeBack] = "WriteBackResult";
    }


    public static void nextInstruction()
    {
        InstructionCycles instrCycles = new InstructionCycles();

        Action[] actions = 
        {
            instrCycles.fetch,
            instrCycles.decode,
            instrCycles.execute,
            instrCycles.writeBack
        };

        Parallel.Invoke(actions);

        // output the results in sequential order

        foreach (Action a in actions)
        {
            Console.Out.WriteLine(instrCycles._dictActionResults[a]);
        }
    }
}

Execute an instruction by calling InstructionCycles.nextInstruction().

Using a static method nextInstruction() (which internally creates an instance of InstructionCycles) allows to run multiple instructions in parallel if desired, since each instruction works with its own results dictionary without interfering with other instructions.

If a static method is not desired and it is also not a requirement to have instructions being executed in parallel, the nextInstruction() can be altered into something like this:

    private readonly object _lockObj = new object();

    public void nextInstruction()
    {
        Action[] actions = 
        {
            fetch,
            decode,
            execute,
            writeBack
        };

        lock (_lockObj)
        {
            _dictActionResults.Clear();

            Parallel.Invoke(actions);

            // output the results in sequential order

            foreach (Action a in actions)
            {
                Console.Out.WriteLine(_dictActionResults[a]);
            }
        }
    }

Note the lock statement. If in any case instructionNext() is called while another thread is already executing instructionNext(), the lock will block the execution of the second thread until the first thread finishes with instructionNext(). As lock object a private object should be chosen which is not accessible from outside the class, which will avoid a number of potential dead-lock scenarios.

I assume the indices for adding results to the dictionary are the actual threads. i.e. _dictActionResults[fetch] = "FetchResult"; where fetch is the thread. — blutuu, Nov 17 '13 at 07:58
The indices are the functions themselves, so to speak. In reality what you get there are delegates of these functions. You could also write something like `Action x = fetch;` where x would be assigned a delegate of the function *fetch*, and then you could do `_dictActionResults[x]`. — , Nov 17 '13 at 08:18
Note that delegates are not the threads (technically). You can understand delegates as a referrer to a function. A delegate, just like a function, can be called/invoked (in whatever thread the calling code executes). — , Nov 17 '13 at 08:20
Oh ok. I think you said before that there is no need to hard code my own threads and AutoResetEvents. In other words, I won't be doing "fetchEvent.Set();" as the parallel invoke takes care of running the methods, correct? — blutuu, Nov 17 '13 at 08:24
Exactly. Are you required to use EventWaitHandle's ? If so, Parallel.Invoke() might not be the answer you are seeking. — , Nov 17 '13 at 08:27
You can do something similar as Parallel.Invoke() also with EventWaitHandle's. That would be a bit more code, of course. If you wish, i can provide a sample with EventWaitHandle's. — , Nov 17 '13 at 08:31
I'm not sure if it's required, but that is the way I was taught. I've been using `waitone()` and `Set()` and they work, but I don't know how to code to get the output I need. My professor gave us an example program (where he only used fetch and decode) that used a delegate within the fetch method. All it seemed to do was access/invoke the windows form text box, but with that code it produced in order execution between the fetch and decode threads. The thread order never lost integrity. Even with that code I'm not sure why the invoked delegate accomplished that task. — blutuu, Nov 17 '13 at 08:40
@Blutuu, hard to tell what your prof did there without knowing his code. But i will provide you an example as another answer that might be similar to what he did. Just give me a few minutes. — , Nov 17 '13 at 08:43
@Blutuu, i have given another answer with `AutoResetEvent`. Note, that when you would only have two cycles (such as fetch and decode), only one AutoResetEvent or similar would be required to ensure correct order of output. — , Nov 17 '13 at 09:33
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/41361/discussion-between-blutuu-and-elgonzo) — blutuu, Nov 17 '13 at 19:19

score 1 · Answer 4 · 2013-11-17T14:34:37.327

This is a demonstration of how to achieve something similar like the Parallel.Invoke() example i have given in my first answer, but this time only using AutoResetEvent.

(ManualResetEvent objects could be used instead of AutoResetEvent, but then the code would need to take care about resetting those in case the nextInstruction() method should be called again.)

Each of the instruction cycle tasks sleeps an arbitrary amount of time ("simulating" different execution times), so to demonstrate the effect of the AutoResetEvents ensuring proper order of outputting the results.

Before waiting for the permission to output the results, all cycles will run concurrently. Only with invoking the WaitOne() method on the respective AutoResetEvent (waiting for permission to output the results), the remainder of the tasks (which is outputting of the results) will be executed in a sequential manner.

public class InstructionCycles
{
    private readonly AutoResetEvent DecodeAllowedToOutputEvent = new AutoResetEvent(false);
    private readonly AutoResetEvent ExecuteAllowedToOutputEvent = new AutoResetEvent(false);
    private readonly AutoResetEvent WriteBackAllowedToOutputEvent = new AutoResetEvent(false);

    // 
    // The InstructionFinishedEvent would not be necessary,
    // if nextInstruction() does not need to wait for the instruction to finish.
    //
    AutoResetEvent InstructionFinishedEvent = new AutoResetEvent(false);

    private void fetch()
    {
        try
        {
            // do something useful...
            // For demo purpose, lets just sleep some arbitrary time
            Thread.Sleep(500);

            // This is the 1st cycle.
            // So we don't need to wait for a previous cycle outputting its result.
            Console.Out.WriteLine("FetchResult");
        }
        finally
        {
            // Allow the next cycle to output its results...
            DecodeAllowedToOutputEvent.Set();
        }
    }

    private void decode()
    {
        try
        {
            // do something useful...
            // For demo purpose, lets just sleep some arbitrary time
            Thread.Sleep(200);

            // Processing done.
            // Now wait to be allowed to output the result.
            DecodeAllowedToOutputEvent.WaitOne();

            Console.Out.WriteLine("DecodeResult");
        }
        finally
        {
            // Allow the next cycle to output its results...
            ExecuteAllowedToOutputEvent.Set();
        }
    }

    private void execute()
    {
        try
        {
            // do something useful...
            // For demo purpose, lets just sleep some arbitrary time
            Thread.Sleep(300);

            // Processing done.
            // Now wait to be allowed to output the result.
            ExecuteAllowedToOutputEvent.WaitOne();

            Console.Out.WriteLine("ExecuteResult");
        }
        finally
        {
            // Allow the next cycle to output its results...
            WriteBackAllowedToOutputEvent.Set();
        }
    }

    private void writeBack()
    {
        try
        {
            // do something useful...
            // For demo purpose, lets just sleep some arbitrary time
            Thread.Sleep(100);

            // Processing done.
            // Now wait to be allowed to output the result.
            WriteBackAllowedToOutputEvent.WaitOne();

            Console.Out.WriteLine("WriteBackResult");
        }
        finally
        {
            // Signal that the instruction (including outputting the result) has finished....
            InstructionFinishedEvent.Set();
        }
    }


    public void nextInstruction()
    {
        //
        // The order in which the cycles are started doesn't really matter,
        // since the way how the AutoResetEvents are being used will ensure
        // correct sequence of outputting results.
        //

        Task.Factory.StartNew(fetch);
        Task.Factory.StartNew(decode);
        Task.Factory.StartNew(execute);
        Task.Factory.StartNew(writeBack);

        // 
        // The InstructionFinishedEvent would not be necessary,
        // if nextInstruction() does not need to wait for the instruction to finish.
        //
        InstructionFinishedEvent.WaitOne();
    }
}

By utilizing try {...} finally {...} it is ensured that the chain of events works unhindered even if some code in one of the threads decides to throw an exception.

Even if you dare to deliberately dispose one or more of the AutoResetEvents and then call nextInstruction(), execution of the instruction cycles will still happen concurrently. However, output of the results will not happen anymore as expected, since when one of the cycles tries to wait on the disposed and now invalid AutoResetEvent, an exception will be thrown, and the next thread will be signaled permission to output its results (thanks to the workings of the try-finally block).

Note: On first glance, the code might look similar to inquisitive's answer. However, there are differences in the program flow / event handling which are significant regarding the behavior of the overall process.

Also note, that nextInstruction() as given above is not thread-safe itself. To allow multiple calls of nextInstruction() from different calling threads, like in my other answer a lock has to be utilized to ensure that only one instruction is being executed at a time:

    private readonly object _lockObj = new object();

    public void nextInstruction()
    {
        //
        // The order in which the cycles are started doesn't really matter,
        // since the way how the AutoResetEvents are being used will ensure
        // correct sequence of outputting results.
        //

        lock (_lockObj)
        {
            Task.Factory.StartNew(fetch);
            Task.Factory.StartNew(decode);
            Task.Factory.StartNew(execute);
            Task.Factory.StartNew(writeBack);

            // 
            // When using lock, InstructionFinishedEvent must
            // be used to ensure that nextInstance() remains
            // in the lock until the instruction finishes.
            //
            InstructionFinishedEvent.WaitOne();
        }
    }

Ok so I implemented this method as well as the one you gave in your first answer, but they both seem to lock up my program. I assume some sort of deadlock or maybe I'm not doing it right. — blutuu, Nov 17 '13 at 16:28
It seems you have done a mistake somewhere. I have tested my code sample myself before posting it as answer (since issues related to multi-threading are always difficult to debug...) What does your code look like? (perhaps you can upload your project somewhere and post a download link here) — , Nov 17 '13 at 16:32
I can do that, but first I have a question. I forgot to remove the wait handles from the different methods (i.e. fetchEvent.waitone()). Would that have an effect? — blutuu, Nov 17 '13 at 16:48
It certainly could be. If your *fetch* method does a *WaitOne* on *fetchEvent*, but that event handle is never set, then the fetch method would wait for an eternity for the *Set()* that never comes... — , Nov 17 '13 at 17:01
Gotcha. It did indeed make a difference, but when my program ran it seemed to print an infinite sequence of numbers that only stopped when I closed the program. Here's a link to my project: https://www.dropbox.com/s/bkotbv5y2pqsj3q/2.2.rar I'm working out of the CPU class only. Use one of the tests within the rar file. — blutuu, Nov 17 '13 at 17:11
Just a quick side note (not really related to your problem, but about programming style in .NET): A Dispose() method is commonly associated with the IDisposable interface, and implementing such a method follows a certain paradigm (Dispose is for cleanup and releasing of resources), which your code is not following (yours is managing program flow). It does not break your code, but it is confusing and might lead to misunderstandings if somebody else looks at your code... — , Nov 17 '13 at 17:24
Okay, look at your fetch method. It runs in a **while** loop. What is the condition that makes the while loop stop? And when would this condition be set/fulfilled? — , Nov 17 '13 at 17:33
Note that the while loop runs, and runs. Eventually, **_pc** will become larger until it reaches the maximum int value at which point it wraps around, becoming int.MinValue. — , Nov 17 '13 at 17:35
When the while loop continues to run it allows for another test file to be run right after the previous test file. I guess it could or should be broken once a test finishes its run. The condition that makes it stop is the `Dispose()` method. — blutuu, Nov 17 '13 at 19:14
Generally, do away with these while loops. Just use tasks (as shown in the answers). Tasks will be executed in several threads anyway. Using those while loops makes your code unnecessarily complicated... — , Nov 17 '13 at 19:19
Since each task/stage will hand over (output) the data for the next clock cycle to the next stage in the pipeline, it would make sense to inverse the order of the events as i posted in my answers. If you don't do that an earlier stage (for example Fetch could overwrite the decoder buffer before the Decoder stage got a chance to read the decoder buffer. Since we are talking multi-threading, we cannot simply assume nice runtime behaviour/speed of all tasks involved. — , Nov 17 '13 at 19:23
So basically, every stage should read and process the instruction data in its respective buffer variable (well, Fetch will read from memory) concurrently. — , Nov 17 '13 at 19:24
A stage cannot nilly-willy write data into the buffer for the next stage (so, to avoid accidental overwriting of that buffer before the next stage has finished reading/processing it). Rather, the next stage will signal the stage before through an event handle when the buffer can be safely written (that means, when the next stage doesn't need the current information in the buffer anymore). — , Nov 17 '13 at 19:29
That means, Decode will tell Fetch when it is okay to write in the Decode buffer. Execute will tell Decode when it is okay to write in the Execution buffer, and so on... — , Nov 17 '13 at 19:30
By the way, using Parallel.Invoke() will provide you with a "cost-free" mechanism to know when all stages for the current clock cycle have been finished. Use it, don't make your life harder than necessary... ;) — , Nov 17 '13 at 19:31
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/41363/discussion-between-blutuu-and-elgonzo) — blutuu, Nov 17 '13 at 19:58
I wrote some pseudo-code of what your code should be doing. [You can see it on PasteBin](http://pastebin.com/aQ2ycY08). There is really not much of complexity involved. I am really not sure why your code is so complicated... — , Nov 17 '13 at 19:58

C# Ordered Threads plus Concurrency

4 Answers4