4

I work currently on a micro service architecture. Before I insert NATS into my project I wanted to test some simple scenarios with it.

In one scenario I have a simple publisher, which publishes 100.000 messages in a for loop over a basic Nats server running on localhost:4222.

The big problem with it, is the subscriber. When he receive between 30.000 - 40.000 messages my whole main.go program and all other go routines just stops and do nothing. I can just quit with ctrl + c. But the Publisher is still keep sending the messages. When I open a new terminal and start a new instance of the subscriber all again works well, till the Subscriber receive about 30000 messages. And the worst thing is that there appears not even one error and also no logs on the server so I have no idea whats going on.

After that I was trying replace the Subscribe-method with the QueueSubscribe-method and all works fine.

What is the main difference between Subscribe and QueueSubscribe?

Is NATS-Streaming a better opportunity? Or in which cases I should prefer Streaming and in which the standard NATS-Server

Here is my code:

Publisher:

package main

import (
    "fmt"
    "log"
    "time"

    "github.com/nats-io/go-nats"
)

func main() {
    go createPublisher()

    for {

    }
}

func createPublisher() {

    log.Println("pub started")

    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Close()

    msg := make([]byte, 16)

    for i := 0; i < 100000; i++ {
        nc.Publish("alenSub", msg)
        if (i % 100) == 0 {
            fmt.Println("i", i)
        }
        time.Sleep(time.Millisecond)
    }

    log.Println("pub finish")

    nc.Flush()

}

Subscriber:

package main

import (
    "fmt"
    "log"
    "time"

    "github.com/nats-io/go-nats"
)

var received int64

func main() {
    received = 0

    go createSubscriber()
    go check()

    for {

    }
}

func createSubscriber() {

    log.Println("sub started")

    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Close()

    nc.Subscribe("alenSub", func(msg *nats.Msg) {
        received++
    })
    nc.Flush()

    for {

    }
}

func check() {
    for {
        fmt.Println("-----------------------")
        fmt.Println("still running")
        fmt.Println("received", received)
        fmt.Println("-----------------------")
        time.Sleep(time.Second * 2)
    }
}
  • 2
    `Publish`, `Subscribe`, and `Flush` all return `error`. Maybe add error checking and see if any of those calls are giving you information that might be useful? Also note that you have a data race on `received` in the subscriber, as you're reading from it and writing to it in separate goroutines. Consider switching it for a [`sync/atomic`](https://golang.org/pkg/sync/atomic/) or adding a mutex. – Adrian Aug 25 '17 at 18:00
  • On publishing side I would use `Request` which sends a message and waits for a confirmation reply. Also set NATS connection max number of retries to connect to `-1` (so it always reconnects). – Kaveh Shahbazian Aug 25 '17 at 18:07
  • 4
    Friendly Go advice. __NEVER__ ignore a returned `error`, __EVER__. – RayfenWindspear Aug 25 '17 at 18:17
  • Install the delve debugger: https://github.com/derekparker/delve/tree/master/Documentation/installation When the subscriber hangs, use dlv to attach to the process and look at the stack trace for all your goroutines. You should be able to see exactly where it's hanging and get some clues as to what's wrong. – Eddy R. Aug 25 '17 at 23:46
  • ok I already added atomic and catch all errs from publisher and also subscriber but I get still nothing. It just stops and with delve I also cant find anything. Here's my actual Subscriber code https://play.golang.org/p/TjnVgrGfyv – Friedrich Kerlach Aug 26 '17 at 20:07

2 Answers2

1

The infinite for loops are likely starving the garbage collector: https://github.com/golang/go/issues/15442#issuecomment-214965471

I was able to reproduce the issue by just running the publisher. To resolve, I recommend using a sync.WaitGroup. Here's how I updated the code linked to in the comments to get it to complete:

package main

import (
    "fmt"
    "log"
    "sync"
    "time"

    "github.com/nats-io/go-nats"
)

// create wait group
var wg sync.WaitGroup

func main() {
    // add 1 waiter
    wg.Add(1)
    go createPublisher()

    // wait for wait group to complete
    wg.Wait()
}

func createPublisher() {

    log.Println("pub started")
    // mark wait group done after createPublisher completes
    defer wg.Done()

    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Close()

    msg := make([]byte, 16)

    for i := 0; i < 100000; i++ {
        if errPub := nc.Publish("alenSub", msg); errPub != nil {
            panic(errPub)
        }

        if (i % 100) == 0 {
            fmt.Println("i", i)
        }
        time.Sleep(time.Millisecond * 1)
    }

    log.Println("pub finish")

    errFlush := nc.Flush()
    if errFlush != nil {
        panic(errFlush)
    }

    errLast := nc.LastError()
    if errLast != nil {
        panic(errLast)
    }

}

I'd recommend updating the above subscriber code similarly.

The main difference between Subscribe and QueueSubscriber is that in Subscribe all subscribers are sent all messages from. While in QueueSubscribe only one subscriber in a QueueGroup is sent each message.

Some details on additional features for NATS Streaming are here: https://nats.io/documentation/streaming/nats-streaming-intro/

We see both NATS and NATS Streaming used in a variety of use cases from data pipelines to control planes. Your choice should be driven by the needs of your use case.

Peter Miron
  • 101
  • 4
1

As stated, remove the for{} loop. Replace with runtime.Goexit().

For subscriber you don't need to create the subscriber in a Go routine. Async subscribers already have their own Go routine for callbacks.

Also protected the received variable with atomic or a mutex.

See the examples here as well.

https://github.com/nats-io/go-nats/tree/master/examples

derek
  • 161
  • 4
  • Thank you derek for the examples. I was searching an example about how to develop it using the QueueSubscribe method. But because is an async operation we don't know when the callback it's called, also in the examples I think because it's async it's not deterministic when the callback function works. Should we implement a non-blocking solution using channels or is there a better way to do it? Thank you so much in advance and great work with nats! – Víctor M Nov 18 '19 at 22:53