3

I tried to follow Rob Pike's example from the talk 'Concurrency is not parallelism' and did something like this: I'm starting many go routines as workers that read from an input channel, perform some processing and then send the result through the output channel.

Then I start another go routine that reads data from some source and send it to the workers through their input channel. Lastly I want to iterate over all of the results in the output channel and do something with them. The problem is that because the work is split between the workers I don't know when all of the workers have finished so I can stop asking the output channel for more results, and my program could end properly.

What is the best practice to know when workers have finished sending results to an output channel?

yogi
  • 1,327
  • 2
  • 12
  • 33

3 Answers3

5

I personally like to use a sync.WaitGroup for that. A waitgroup is a synchronized counter that has three methods - Wait(), Done() and Add(). What you do is increment the the waitgroup's counter, pass it to the workers, and have them call Done() when they're done. Then you just block on the waitgroup on the other end and close the output channel when they're all done, causing the output processor to exit.

Basically:

// create the wait group
wg := sync.WaitGroup{}

// this is the output channel
outchan := make(chan whatever)

// start the workers
for i := 0; i < N; i++ {
   wg.Add(1) //we increment by one the waitgroup's count

   //the worker pushes data onto the output channel and calls wg.Done() when done
   go work(&wg, outchan)
}

// this is our "waiter" - it blocks until all workers are done and closes the channel
go func() {
  wg.Wait()
  close(outchan)
}()

//this loop will exit automatically when outchan is closed
for item := range outchan {
   workWithIt(item)
}

// TADA!
Not_a_Golfer
  • 47,012
  • 14
  • 126
  • 92
0

Please can I firstly clarify your terminology: a misunderstanding on the ends of channels could cause problems later. You ask about "output channels" and "input channels". There is no such thing; there are only channels.

Every channel has two ends: the output (writing) end, and the input (reading) end. I will assume that that is what you meant.

Now to answer your question.

Take the simplest case: you have only one sender goroutine writing to a channel, and you only have one worker goroutine reading from the other end, and the channel has zero buffering. The sender goroutine will block as it writes each item it until that item has been consumed. Typically this happens quickly the first time. Once the first item has passed to the worker, the worker will then be busy and the sender will have to wait before the second item can be passed over. So a ping-pong effect follows: either the writer or the reader will be busy but not both. The goroutines will be concurrent in the sense described by Rob Pike, but not always actually executing in parallel.

In the case where you have many worker goroutines reading from the channel (and its input end is shared by all of them), the sender can initially distribute one item to each worker, but then it has to wait whilst they work (similar to the ping-pong case described above). Finally, when all items have been sent by the sender, it has finished its work. However, the readers may not, yet, have finished their work. Sometimes we care that the sender finishes early, and sometimes we don't. Knowing when this happens is most easily done with a WaitGroup (see Not_a_Golfer's answer and my answer to a related question).

There is a slightly more complex alternative: you can use a return channel for signalling completion instead of a WaitGroup. This isn't hard to do, but WaitGroup is preferred in this case, being simpler.

If instead the channel were to contain a buffer, the point at which the sender had sent its last item would happen sooner. In the limit case when the channel has one buffer space per worker; this would allow the sender to complete very quickly and then, potentially, get on with something else. (Any more buffering than this would be wasteful).

This decoupling of the sender allows a fully asynchronous pattern of behaviour, beloved of people using other technology stacks (Node-JS and the JVM spring to mind). Unlike them, Go doesn't need you to do this, but you have the choice.

Back in the early '90s, as a side-effect of work on the Bulk Synchronous Parallelism (BSP) strategy, Leslie Valiant proved that sometimes very simple synchronisation strategies can be cheap. The crucial factor is that there is a need for enough parallel slackness (a.k.a. excess parallelism) to keep the processor cores busy. That means there must be plenty enough other work to be done so that it really doesn't matter if any particular goroutine is blocked for a period of time.

Curiously, this can mean that working with smaller numbers of goroutines might require more care than working with larger numbers.

Understanding the impact of excess parallelism is useful: it is often not necessary to put extra effort into making everything asynchronous if the network as a whole has excess parallelism, because the CPU cores would be busy either way.

Therefore, although it is useful to know how to wait until your sender has completed, a larger application may not need you to be concerned in the same way.

As a final footnote, WaitGroup is a barrier in the sense used in BSP. By combining barriers and channels, you are making use of both BSP and CSP.

Community
  • 1
  • 1
Rick-777
  • 9,714
  • 5
  • 34
  • 50
0
var Z = "Z"

func Loop() {
    sc := make(chan *string)
    ss := make([]string, 0)
    done := make(chan struct{}, 1)
    go func() {
        //1 QUERY
        slice1 := []string{"a", "b", "c"}
        //2 WG INIT
        var wg1 sync.WaitGroup
        wg1.Add(len(slice1))
        //3 LOOP->
        loopSlice1(slice1, sc, &wg1)
        //7 WG WAIT<-
        wg1.Wait()
        sc <- &Z
        done <- struct{}{}
    }()

    go func() {
        var cc *string
        for {
            cc = <-sc
            log.Infof("<-sc %s", *cc)
            if *cc == Z {
                break
            }
            ss = append(ss, *cc)
        }
    }()
    <-done
    log.Infof("FUN: %#v", ss)
}

func loopSlice1(slice1 []string, sc chan *string, wg1 *sync.WaitGroup) {
    for i, x := range slice1 {
        //4 GO
        go func(n int, v string) {
            //5 WG DONE
            defer wg1.Done()
            //6 DOING
            //[1 QUERY
            slice2 := []string{"X", "Y", "Z"}
            //[2 WG INIT
            var wg2 sync.WaitGroup
            wg2.Add(len(slice2))
            //[3 LOOP ->
            loopSlice2(n, v, slice2, sc, &wg2)
            //[7 WG WAIT <-
            wg2.Wait()
        }(i, x)
    }
}

func loopSlice2(n1 int, v1 string, slice2 []string, sc chan *string, wg2 *sync.WaitGroup) {
    for j, y := range slice2 {
        //[4 GO
        go func(n2 int, v2 string) {
            //[5 WG DONE
            defer wg2.Done()
            //[6 DOING
            r := fmt.Sprintf("%v%v %v,%v", n1, n2, v1, v2)
            sc <- &r
        }(j, y)
    }
}
feuyeux
  • 1,158
  • 1
  • 9
  • 26