2

I'm not sure what could be the best way to explain this but we constantly observe our CI failing because of this SIGBUS issue. The error looks all internal to Go and we are clueless.

We have run the test cases multiple times on our local boxes in an attempt to resolve the error. But every time our test passes and we see no SIGBUS problem.

It happens only on random test files on our CI boxes.

The relevant stack trace:

# github.com/magna5/cdr_archiver/transport/nats/nats_test.test
unexpected fault address 0x7f56f7744000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f56f7744000 pc=0x45bbbf]

goroutine 1 [running]:
runtime.throw(0x690ec5, 0x5)
    /usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc00072cef0 sp=0xc00072cec0 pc=0x42dc32
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:391 +0x455 fp=0xc00072cf20 sp=0xc00072cef0 pc=0x442bd5
runtime.memmove(0x7f56f7588d80, 0xc006cae000, 0x2b23e4)
    /usr/local/go/src/runtime/memmove_amd64.s:423 +0x50f fp=0xc00072cf28 sp=0xc00072cf20 pc=0x45bbbf
cmd/link/internal/ld.(*OutBuf).Write(0xc000076040, 0xc006cae000, 0x2b23e4, 0x2c8132, 0x200, 0x10, 0x0)
    /usr/local/go/src/cmd/link/internal/ld/outbuf.go:65 +0xa0 fp=0xc00072cf78 sp=0xc00072cf28 pc=0x5ac5c0
cmd/link/internal/ld.(*OutBuf).WriteSym(0xc000076040, 0xc0058c6140)
    /usr/local/go/src/cmd/link/internal/ld/outbuf.go:159 +0x6c fp=0xc00072cfc8 sp=0xc00072cf78 pc=0x5acd8c
cmd/link/internal/ld.blk(0xc000076040, 0xc005b14000, 0xa23d, 0xc800, 0xb60d80, 0x483164, 0x877200, 0x200, 0x200)
    /usr/local/go/src/cmd/link/internal/ld/data.go:786 +0x10f fp=0xc00072d098 sp=0xc00072cfc8 pc=0x559a0f
cmd/link/internal/ld.writeDatblkToOutBuf(0xc00004c000, 0xc000076040, 0x990000, 0x483164)
    /usr/local/go/src/cmd/link/internal/ld/data.go:825 +0xaf fp=0xc00072d260 sp=0xc00072d098 pc=0x55a05f
cmd/link/internal/ld.Datblk(...)
    /usr/local/go/src/cmd/link/internal/ld/data.go:808
cmd/link/internal/amd64.asmb(0xc00004c000)
    /usr/local/go/src/cmd/link/internal/amd64/asm.go:688 +0x1fe fp=0xc00072d2d0 sp=0xc00072d260 pc=0x5d1dbe
cmd/link/internal/ld.Main(0x84bde0, 0x10, 0x20, 0x1, 0x7, 0x10, 0x69ac0d, 0x1b, 0x6976f4, 0x14, ...)
    /usr/local/go/src/cmd/link/internal/ld/main.go:262 +0xd5d fp=0xc00072d428 sp=0xc00072d2d0 pc=0x5ab60d
main.main()
    /usr/local/go/src/cmd/link/main.go:65 +0x1d6 fp=0xc00072df60 sp=0xc00072d428 pc=0x614676
runtime.main()
    /usr/local/go/src/runtime/proc.go:203 +0x21e fp=0xc00072dfe0 sp=0xc00072df60 pc=0x42f5ce
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00072dfe8 sp=0xc00072dfe0 pc=0x45a501

NOTES: local boxes are MACOS and CI we are using is powered by drone and we use golang:1.13 images to run all our test against so my assumption is linux/amd64

Locally both via MACOS and using drone exec and the test does not result in SIGBUS error which happens quite frequently on CI server.

Noobie
  • 461
  • 1
  • 12
  • 34
  • You have communicated precisely zero details about the version of Go used to run the test (and whether it's "stock" Go or `gccgo` or something else), and the `GOOS/GOARCH` combination (though, from `SIGBUS`, I'd guess it's some xBSD or MacOS, and form `asm_amd64.s` I'm sure it's `amd64`). – kostix Jun 22 '20 at 10:10
  • An interesting idea would be to try to narrow the problem down. What difference there is between your local boxes and the one running the CI? How do they differ in the number of CPU sockets/cores? Is the HW powering the CI okay (no faulty RAM)? What's the difference in background load (the crash may happen only under heavy load for instance). – kostix Jun 22 '20 at 10:12
  • Do the local boxes and the CI have identical `GOOS+GOARCH`? – kostix Jun 22 '20 at 10:14
  • @kostik replied – Noobie Jun 22 '20 at 10:43
  • 1
    Looks much like [this issue](https://github.com/golang/go/issues/37310); you might consider chiming in there. – kostix Jun 23 '20 at 10:08

0 Answers0