1

I have this snippet of code which concurrently runs a function using an input and output channel and associated WaitGroups, but I was clued in to the fact that I've done some things wrong. Here's the code:

func main() {
    concurrency := 50
    var tasksWG sync.WaitGroup
    tasks := make(chan string)
    output := make(chan string)

    for i := 0; i < concurrency; i++ {
        tasksWG.Add(1)

        // evidentally because I'm processing tasks in a groutine then I'm not blocking and I end up closing the tasks channel almost immediately and stopping tasks from executing
        go func() {
            for t := range tasks {
                output <- process(t)
                continue
            }
            tasksWG.Done()
        }()
    }

    var outputWG sync.WaitGroup
    outputWG.Add(1)
    go func() {
        for o := range output {
            fmt.Println(o)
        }
        outputWG.Done()
    }()

    go func() {
        // because of what was mentioned in the previous comment, the tasks wait group finishes almost immediately which then closes the output channel almost immediately as well which ends ranging over output early
        tasksWG.Wait()
        close(output)
    }()

    f, err := os.Open(os.Args[1])
    if err != nil {
        log.Panic(err)
    }

    s := bufio.NewScanner(f)
    for s.Scan() {
        tasks <- s.Text()
    }

    close(tasks)
    // and finally the output wait group finishes almost immediately as well because tasks gets closed right away due to my improper use of goroutines
    outputWG.Wait()
}

func process(t string) string {
    time.Sleep(3 * time.Second)
    return t
}

I've indicated in the comments where I've implementing things wrong. Now these comments make sense to me. The funny thing is that this code does indeed seem to run asynchronously and dramatically speeds up execution. I want to understand what I've done wrong but it's hard to wrap my head around it when the code seems to execute in an asynchronous way. I'd love to understand this better.

  • "I end up closing the tasks channel almost immediately" That's not true. You close the channel after the file has been read and all of its lines have been sent to the channel. That is how you're supposed to do it. I see no issues in this code (except that you don't need the outputWG if you put the file processing into a goroutine and the output loop at the end of the main goroutine, but that doesn't affect correctness). – Peter Dec 29 '21 at 19:21

1 Answers1

1

Your main goroutine is doing a couple of things sequentially and others concurrently, so I think your order of execution is off

    f, err := os.Open(os.Args[1])
    if err != nil {
        log.Panic(err)
    }

    s := bufio.NewScanner(f)
    for s.Scan() {
        tasks <- s.Text()
    }

Shouldn't you move this up top? So then you have values sent to tasks

THEN have your loop which ranges over tasks 50 times in the concurrency named for loop (you want to have something in tasks before calling code that ranges over it)


go func() {
    // because of what was mentioned in the previous comment, the tasks wait group finishes almost immediately which then closes the output channel almost immediately as well which ends ranging over output early
    tasksWG.Wait()
    close(output)
}()

The logic here is confusing me, you're spawning a goroutine to wait on the waitgroup, so here the wait is nonblocking on the main goroutine - is that what you want to do? It won't wait for tasksWG to be decremented to zero inside main, it'll do that inside the goroutine that you've created. I don't believe you want to do that?


It might be easier to debug if you could give more details on the expected output?

  • Interestingly if I move that part to the top I run into deadlock. Also, this tool is designed to process tons of data to the number of 1000s of lines, so perhaps this is made easier by not blocking when waiting for tasks. –  Dec 29 '21 at 19:36
  • A deadlock is where concurrent processes are waiting on eachother, so they'll run forever without outside intervention. Where are you calling your "close(tasks)"? You could defer it. The section on deadlocks is accessible here from [Concurrency in Go](https://www.google.co.uk/books/edition/Concurrency_in_Go/V3EtDwAAQBAJ?hl=en&gbpv=1&pg=PT29&printsec=frontcover) – SuperSecretAndNotSafeFromWork Dec 29 '21 at 19:50
  • It's being called at the very bottom of that function. I'll check out that link, thanks. –  Dec 29 '21 at 19:56