0

This code runs concurrently in many goroutines,the following code is a key and relevant part extracted from the production environment code.:

func check() {

    .......check condition......

    //skipEnsure and skipNative will not both false here
    var wg sync.WaitGroup
    if !skipEnsure {
        wg.Add(1)
        go func(wg *sync.WaitGroup) {
            defer wg.Done()
            dosomething1()
        }(&wg)
    }
    if !skipNative {
        wg.Add(1)
        go func(wg *sync.WaitGroup) {
            defer wg.Done()
            dosomething2()
        }(&wg)
    }
    wg.Wait() //panic here
}

func worker() {
    ......

    go check()
}

func main() {
    for i:=0;i<32;i++ {
        go worker()
    }
    ch := make(chan any, 0)
    <-ch
}

I can reproduce the issue when WaitGroup is a global variable, but not in the code provided. It can cause panic after running for a while in the production environment.

jx.wtj
  • 29
  • 3
  • 5
    We can't help without seeing the code which reproduces the error. The problem is explained in the panic, you reach a point where `Wait()` is triggered but you are still concurrently calling `wg.Add`. There's an error somewhere in your logic, because a `WaitGroup` isn't useful if it reaches `0` before all the work is done. – JimB Jul 24 '23 at 14:07
  • 1
    Don't share wait groups. These shouldn't be global variables, make them function level instead and should go out of scope as fast as possible. – JBirdVegas Jul 24 '23 at 16:26
  • What happens if both `skipEnsure` and `skipNative` are true? – erik258 Jul 24 '23 at 16:44
  • I have updated the code to make the entire logic look more complete. – jx.wtj Jul 25 '23 at 02:23

1 Answers1

0

If I understood you correctly the code looks like this one:

 1  package main
 2  
 3  import (
 4      "sync"
 5      "time"
 6  )
 7  
 8  var wg sync.WaitGroup
 9  
10  func work() {
11      wg.Add(1)
12      go func(wg *sync.WaitGroup) {
13          defer wg.Done()
14          time.Sleep(1 * time.Millisecond)
15          println("one done")
16      }(&wg)
17      wg.Add(1)
18      go func(wg *sync.WaitGroup) {
19          defer wg.Done()
20          time.Sleep(2 * time.Millisecond)
21          println("two done")
22      }(&wg)
23      wg.Wait() //panic here
24  }
25  
26  func main() {
27      for i := 0; i < 10; i++ {
28          go work()
29      }
30      wg.Wait()
31      println("done")
32  }

sync.WaitGroup is not designed for such usage. It is not obvious when wg.Add and wg.Wait will be called. If you try to run it with data race detector you can see something like this:

==================
WARNING: DATA RACE
Write at 0x0000011707e8 by goroutine 6:
  runtime.racewrite()
      <autogenerated>:1 +0x24
  main.work()
      /1.go:23 +0x1b2

Previous read at 0x0000011707e8 by goroutine 7:
  runtime.raceread()
      <autogenerated>:1 +0x24
  main.work()
      /1.go:11 +0x35

Goroutine 6 (running) created at:
  main.main()
      /1.go:28 +0x32

Goroutine 7 (running) created at:
  main.main()
      /1.go:28 +0x32
==================
Found 1 data race(s)

You can use own WaitGroup for each goroutine (as in your example). I cannot imagine any reason against it. For example, this code will work fine, without any data race:

package main

import (
    "sync"
    "time"
)

func work() {
    var wg sync.WaitGroup
    wg.Add(1)
    go func(wg *sync.WaitGroup) {
        defer wg.Done()
        time.Sleep(1 * time.Millisecond)
        println("one done")
    }(&wg)
    wg.Add(1)
    go func(wg *sync.WaitGroup) {
        defer wg.Done()
        time.Sleep(2 * time.Millisecond)
        println("two done")
    }(&wg)
    wg.Wait() //panic here
}

func main() {
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            work()
            wg.Done()
        }()
    }
    wg.Wait()
    println("done")
}
Eugene Mikhalev
  • 598
  • 3
  • 13
  • I'm sorry, I didn't express myself clearly earlier. My code in the production environment is the same as the example you provided below, with the WaitGroup initialized inside the function, but I'm still getting a panic. – jx.wtj Jul 25 '23 at 01:53