1

Using Fiber v2.38.1, which in turn use fasthttp/websocket v1.5.0.

App crashed with following error:

2022-10-15T04:05:42.983563+00:00 app[web.1]: time="2022-10-15T04:05:42Z" level=info msg="close msg received: &{418 close}"
2022-10-15T04:05:42.983564+00:00 app[web.1]: conn is nil: false, msg: &{418 close <nil> conn 0}
2022-10-15T04:05:42.986035+00:00 app[web.1]: panic: runtime error: invalid memory address or nil pointer dereference
2022-10-15T04:05:42.986035+00:00 app[web.1]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x8db967]
2022-10-15T04:05:42.986036+00:00 app[web.1]: 
2022-10-15T04:05:42.986037+00:00 app[web.1]: goroutine 86 [running]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).beginMessage(0x0, 0xc000348f60, 0x1)
2022-10-15T04:05:42.986037+00:00 app[web.1]:    /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/conn.go:479 +0x27
2022-10-15T04:05:42.986038+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).NextWriter(0x0, 0x1)
2022-10-15T04:05:42.986038+00:00 app[web.1]:    /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/conn.go:520 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).WriteJSON(0xe79a80?, {0xc143a0, 0xc00041e600})
2022-10-15T04:05:42.986039+00:00 app[web.1]:    /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/json.go:24 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/model.(*WsClient).WriteMsg(0xc00056a280, 0xc00041e600)
2022-10-15T04:05:42.986040+00:00 app[web.1]:    /tmp/build_0b2fe533/model/ws_model.go:97 +0xf9
2022-10-15T04:05:42.986040+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1.2()
2022-10-15T04:05:42.986041+00:00 app[web.1]:    /tmp/build_0b2fe533/controller/content_ctl.go:623 +0x171
2022-10-15T04:05:42.986041+00:00 app[web.1]: created by github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1
2022-10-15T04:05:42.986042+00:00 app[web.1]:    /tmp/build_0b2fe533/controller/content_ctl.go:608 +0x10cc
2022-10-15T04:05:43.113930+00:00 heroku[web.1]: Process exited with status 2

It crashed at conn.go:479, in function beginMessage():

if c.writer != nil {

I've checked that c is not nil (as shown in log), so how it crashed ...


BTW, it's deployed on heroku, I saw following log on heroku:

2022-10-15T04:59:30.344791+00:00 heroku[router]: at=error code=H15 desc="Idle connection"

Not sure is it relevent.


Update: relevant code

  • A separate goroutine that handle close msg (part of content_ctl.go):
    // handle close - from a channel,
    // TODO: use goroutine pool ?
    go func() {
        if r := recover(); r != nil { // TODO: is this proper ?,
            fmt.Printf("recover from panic: %v\n", r)
        }

        closeMsg := <-(*client).CloseChan // handle close,
        log.Infof("close msg received: %v", closeMsg)

        // send close msg,
        closeResp := &model.WsResp{
            Status: closeMsg.HttpCode,
            Source: model.RespSourceConn,
            Msg:    closeMsg.Msg,
        }

        log.Infof("is conn nil: %v, msg: %v\n", client.WsConn == nil, closeResp)
        err = client.WriteMsg(closeResp)
        if err != nil {
            log.Errorf("error - on write close msg: %v", closeResp)
        }

        ctl.cwm.CloseAndCleanup(client.Id) // close & clean up,
    }()
  • Definition of WsClient struct, and its method WriteMsg() (part of ws_model.go):
    // websocket client,
    type WsClient struct {
        WsUser
        ContentUuid string          `json:"contentUuid"` // content uuid id,
        WsConn      *websocket.Conn `json:"-"`           // websocket connection,
        CloseChan   chan *CloseMsg  `json:"-"`           // close channel,
        WriteLock   *sync.Mutex     // write lock, to avoid concurrent write to ws connection, which will cause panic, acquire lock when write or close,
    }

    // write ws msg, with lock,
    func (wsc *WsClient) WriteMsg(msg *WsResp) error {
        if wsc.WsConn == nil {
            return errutil.ErrNoConn
        }
        wsc.WriteLock.Lock()
        err := wsc.WsConn.WriteJSON(msg)
        wsc.WriteLock.Unlock()
        return err
    }

  • CloseMsg struct (part of ws_model.go):
    // close msg sent to close chan,
    type CloseMsg struct {
        HttpCode int    `json:"httpCode"`
        Msg      string `json:"msg"`
    }

Problem status:

  • Actually, before asking the question I've solved the problem by checking the httpCode in the CloseMsg, if it's 418, then I won't send the close msg.
  • Though the issue is gone for now, but still I want to know the actual reason of the problem, so I asked the question.

Some background:

  • I've several possible value for CloseMsg's HttpCode.
    • 418 is the default one, it happens only when unexpected io issue happen on the connection (e.g heroku terminate the connection due to idle ?).
    • For other cases, e.g if I want to kick out a connection, then I will send other http code.
  • From the log, I found the problem happens only when the http code from close msg is 418.
  • BTW, though 418 might not be the best http code to use, (it's from a joke), but that doesn't effect the problem I guess.
  • And, In WsClient's WriteMsg() method, I've always checked whether the conn nil, before send msg.

Update - more log from heroku

2022-10-16T05:19:03.088622+00:00 heroku[router]: at=error code=H15 desc="Idle connection" method=GET path="/api/v1/content/ws/RgPx8znal7AJb
2022-10-16T05:19:03.065493+00:00 app[web.1]: time="2022-10-16T05:19:03Z" level=info msg="error - on read:  websocket: close 1006 (abnormal closure): unexpected EOF"
2022-10-16T05:20:57.758379+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="cleanup client, id = 'wscid-ee670cc5-4100-49d6-9857-8284d93a6d33'"
2022-10-16T05:20:57.758505+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="close msg received: &{418 close}"

Seems heroku will kill the websocket connection if there is no msg for a while. That's when 418 appear.

And, I've asked a question about heroku killing the websocket connection on SO: heroku keep killing idle connections (websocket)

Eric
  • 22,183
  • 20
  • 145
  • 196
  • The instance of `go-fit/model.(*WsClient)` that is used at `controller/content_ctl.go:623` is likely `nil`, or if it is initialized it is likely not initialized correctly/completely. – mkopriva Oct 15 '22 at 07:20
  • @mkopriva I mentioned that it's not nil, you can see the log. If it's not `correctly/completely` initialized, there might be a bug in the libaray ... – Eric Oct 15 '22 at 09:24
  • 2
    You said `c` is not `nil`. But you haven't provided any code nor any more details about `c`, it is therefore unclear what `c` is. What type it is. Or where it is used. – mkopriva Oct 15 '22 at 09:35
  • For sure the problem could be with `fiber` or with `fasthttp`, but it is generally much more likely that the issue is in the user's code. Also it doesn't matter too much what you claim about the correctness of your code since it's only natural to believe that the code one has written is correct, but it is also quite common to be incorrect in that assessment. Until one can inspect your code and determine for themselves that everything there is done correctly, one can rightfully assume that the libraries are ok but your code isn't. – mkopriva Oct 15 '22 at 09:39
  • @mkopriva As the question mentioned: `c` is from https://github.com/fasthttp/websocket/blob/418d79e8a22dd971023704dc93c6b43a16f0b6ba/conn.go#L479. The log line `conn is nil: false,` is printed by me, right before calling`c.WriteJSON()`. If `c` is nil, it would panic when I call `c.WriteJSON()`, it won't even reach `c.beginMessage()` I guess. – Eric Oct 15 '22 at 10:03
  • 1
    As my first comment states: The instance `go-fit/model.(*WsClient)` is `nil` or initialized incorrectly. So if `c` is not `model.WsClient` then `c` has nothing to with the problem. – mkopriva Oct 15 '22 at 10:06
  • *"If c is nil, it would panic when I call c.WriteJSON(), it won't even reach c.beginMessage()"* -- This is *ONLY* true when `c` is a `nil` **interface** type. When `c` is a `nil` **non-interface** type (e.g. pointer, map, slice, etc.) then `c.WriteJSON` will NOT panic. Calling methods on `nil` non-interface types is allowed and will not panic. The panic happens when you try to access the receiver inside the method, e.g. access a field of a nil struct-pointer receiver, or index into nil map receiver. https://go.dev/play/p/0VA9RE-xH1P – mkopriva Oct 15 '22 at 10:12
  • I've run this program on bare linux for months, it never crash like this, but after deploying on heroku, this happen quickly, maybe this is related to the `idle connection` from heroku as mentioned at end of the question. – Eric Oct 15 '22 at 10:13
  • @mkopriva `c` is a sub field of `model.WsClient`, since `c` is not nil, that means `model.WsClient` is not nil value too. This is relevant to `c`, not `model.WsClient` at all. – Eric Oct 15 '22 at 16:34
  • Fair enough Eric, but then that means that `c` is `nil` as the stacktrace claims. I think it's not unreasonable to believe the stacktrace I've seen rather than the believing the code I haven't seen, hopefully you understand. – mkopriva Oct 15 '22 at 18:12
  • 1
    If you shared your code we could perhaps solve this. Without your code however, expecting someone to know exactly what's wrong is a tall order, in my opinion. – mkopriva Oct 15 '22 at 18:16
  • 1
    @mkopriva I suspected `c` is nil, that's why I added the log line after I found the problem. And the log said it's not nil, that's weird, and then I asked the question, anyway I've added the relevant code at end of the question. – Eric Oct 16 '22 at 04:29
  • https://go.dev/play/p/1vq1vfd3me4 you're not locking access to the wsconn properly in at least two places in the example code of the question, there are likely other places in your app where you're failing to do that as well. – mkopriva Oct 16 '22 at 06:02
  • @mkopriva I did thought about that, but is that the problem, let me do some test later. There is no other place write the value of `WsConn` after connection. And each ws connection only have 1 extra goroutine which is for close channel. – Eric Oct 16 '22 at 06:13
  • @mkopriva I did some test, the issue remains, here is the code: https://go.dev/play/p/K5lBNIGZqT6 . This is as I expected, because the only place write `WsClient`'s `WsConn` field is on creation, no code will set it to nil ever (gc is not relevant here), as mentioned in above comment, and that's why I didn't try it. The original reason I check the nil value is to make sure the initialization is finished. And, the mutex is there only because the fasthttp websocket doesn't support concurrent write. It happens only when heroku kill the connection, that's not just coincidence I think. – Eric Oct 16 '22 at 14:09
  • I'm sure heroku killing the connection is what causes the bug to arise, but killing a connection on heroku's end won't automatically set a variable in your program to `nil`. Under normal circumstances that can only be achieved from within the Go program, and if it's not directly done by your code then it could be one of the package's you are depending on. You should re-read the documentation of those packages, maybe one of them mentions this behaviour. If not, you should scan their source code. Look through their github issues, etc. Anyway, without a [mcve] I don't know what else to suggest. – mkopriva Oct 16 '22 at 16:09
  • I already created github issue similar as this SO post, there is no reply yet, since I already fixed it in some way, won't spend more time on it for now. – Eric Oct 17 '22 at 13:42
  • 1
    @mkopriva This is a `fasthttp` bug, see my answer with link. – Eric Nov 25 '22 at 03:16

1 Answers1

1

This is a fasthttp bug, and fixed in v1.5.1-rc.8

refer: https://github.com/fasthttp/websocket/issues/30#issuecomment-1326860975

Eric
  • 22,183
  • 20
  • 145
  • 196