Please can I firstly clarify your terminology: a misunderstanding on the ends of channels could cause problems later. You ask about "output channels" and "input channels". There is no such thing; there are only channels.
Every channel has two ends: the output (writing) end, and the input (reading) end. I will assume that that is what you meant.
Now to answer your question.
Take the simplest case: you have only one sender goroutine writing to a channel, and you only have one worker goroutine reading from the other end, and the channel has zero buffering. The sender goroutine will block as it writes each item it until that item has been consumed. Typically this happens quickly the first time. Once the first item has passed to the worker, the worker will then be busy and the sender will have to wait before the second item can be passed over. So a ping-pong effect follows: either the writer or the reader will be busy but not both. The goroutines will be concurrent in the sense described by Rob Pike, but not always actually executing in parallel.
In the case where you have many worker goroutines reading from the channel (and its input end is shared by all of them), the sender can initially distribute one item to each worker, but then it has to wait whilst they work (similar to the ping-pong case described above). Finally, when all items have been sent by the sender, it has finished its work. However, the readers may not, yet, have finished their work. Sometimes we care that the sender finishes early, and sometimes we don't. Knowing when this happens is most easily done with a WaitGroup (see Not_a_Golfer's answer and my answer to a related question).
There is a slightly more complex alternative: you can use a return channel for signalling completion instead of a WaitGroup
. This isn't hard to do, but WaitGroup
is preferred in this case, being simpler.
If instead the channel were to contain a buffer, the point at which the sender had sent its last item would happen sooner. In the limit case when the channel has one buffer space per worker; this would allow the sender to complete very quickly and then, potentially, get on with something else. (Any more buffering than this would be wasteful).
This decoupling of the sender allows a fully asynchronous pattern of behaviour, beloved of people using other technology stacks (Node-JS and the JVM spring to mind). Unlike them, Go doesn't need you to do this, but you have the choice.
Back in the early '90s, as a side-effect of work on the Bulk Synchronous Parallelism (BSP) strategy, Leslie Valiant proved that sometimes very simple synchronisation strategies can be cheap. The crucial factor is that there is a need for enough parallel slackness (a.k.a. excess parallelism) to keep the processor cores busy. That means there must be plenty enough other work to be done so that it really doesn't matter if any particular goroutine is blocked for a period of time.
Curiously, this can mean that working with smaller numbers of goroutines might require more care than working with larger numbers.
Understanding the impact of excess parallelism is useful: it is often not necessary to put extra effort into making everything asynchronous if the network as a whole has excess parallelism, because the CPU cores would be busy either way.
Therefore, although it is useful to know how to wait until your sender has completed, a larger application may not need you to be concerned in the same way.
As a final footnote, WaitGroup
is a barrier in the sense used in BSP. By combining barriers and channels, you are making use of both BSP and CSP.