-1

I encounter a problem that seems simple but that I cannot reproduce, and therefore that I cannot explain.

This problem occurs in production, what is mysterious is that it occurs very rarely (that's why I can't reproduce it), it may be a factor that I could not set out as an example but here is the context:

type MyType struct {
    Field1 string
    Field2 int
    Field3 time.Time
    Field4 []float64
    // No pointers fields
}

func main() {
    var MyChan = make(chan interface{})

    go func() {
// This routine is reading and parsing messages from a WS dial and writing it in a channel
// There is 2 only possible answer in the channel : a string, or a "MyType" like (with more fields, but no pointers in)     
        var Object MyType
        Object.Field1 = "test"
        // ..

        MyChan <- Object
    }()

    go func() {
// This routine is sending a message to a WS dial and wait for the answer in a channel fed by another routine :
        var Object interface{}
        go func(Get *interface{}) {
            *Get = <- MyChan
        } (&Object)
        for Object == nil {
            time.Sleep(time.Nanosecond * 1)
        }
        log.Println(fmt.Sprint(Object)) // Panic here from the fmt.Sprint() func
    }()
}

Panic stack trace :

runtime error: invalid memory address or nil pointer dereference
panic(0x87a840, 0xd0ff40)
    X:/Go/GoRoot/src/runtime/panic.go:522 +0x1b5
reflect.Value.String(0x85a060, 0x0, 0xb8, 0xc00001e500, 0xc0006f1a80)
    X:/Go/GoRoot/src/reflect/value.go:1781 +0x45
fmt.(*pp).printValue(0xc006410f00, 0x85a060, 0x0, 0xb8, 0x76, 0x1)
    X:/Go/GoRoot/src/fmt/print.go:747 +0x21c3
fmt.(*pp).printValue(0xc006410f00, 0x8ed5c0, 0x0, 0x99, 0x76, 0x0)
    X:/Go/GoRoot/src/fmt/print.go:796 +0x1b52
fmt.(*pp).printArg(0xc006410f00, 0x8ed5c0, 0x0, 0x76)
    X:/Go/GoRoot/src/fmt/print.go:702 +0x2ba
fmt.(*pp).doPrint(0xc006410f00, 0xc0006f22a0, 0x1, 0x1)
    X:/Go/GoRoot/src/fmt/print.go:1147 +0xfd
fmt.Sprint(0xc0006f22a0, 0x1, 0x1, 0x0, 0x0)
    X:/Go/GoRoot/src/fmt/print.go:250 +0x52

Go version : 1.12.1 windows/amd64

Thanking you for your time, I hope someone can explain to me what's wrong.

Anarz
  • 25
  • 7
  • 3
    Assigning to *Get races against the for condition right after. Race conditions make the behavior of your program undefined. – Peter Jan 20 '20 at 09:53

1 Answers1

1

You have a data race here:

    var Object interface{}
    go func(Get *interface{}) {
        *Get = <- MyChan
    } (&Object)
    for Object == nil {
        time.Sleep(time.Nanosecond * 1)
    }

The goroutine that writes on the variable Object does so without using a lock: it receives from the channel, and when it gets a value, it writes it to *Get where Get == &Object, so that it writes the interface value of Object.

Meanwhile, the goroutine running the for loop reads from Object, checking for nil-ness. It reads without using a lock, so that it can read a partially-written value.

What is actually happening is that the partially-written value is non-nil, without the whole value being set. So the for loop stops looping and the code progresses to the next line:

    log.Println(fmt.Sprint(Object)) // Panic here from the fmt.Sprint() func

Since Object is only partly written, accessing its value produces unpredictable results—but in this case, it produces a panic. (In particular, the type field of the interface has been set, but the value field is still zero. The actual panic is from this line in src/reflect/value.go:

               return *(*string)(v.ptr)

with v.ptr not having been set.)

It's not clear why you're logging the value at all, nor why you are using shared memory to communicate, but if you do do this, you will need locking. It's usually wiser not to do this. See also this answer to Explain: Don't communicate by sharing memory; share memory by communicating.

(Or, to put it more simply, why not just use Object := <-MyChan in place of the whole goroutine-and-spin?)

torek
  • 448,244
  • 59
  • 642
  • 775
  • There is stuff between the goroutine and the `for Object == nil {}`, but indeed, you exposed an error which implies to completely review this method. I will follow your advice for memory sharing. – Anarz Jan 20 '20 at 10:26