0

I am building a performance oriented REST-API. The skeleton was built with go-swagger.

The API has 3ms to answer and is succeeding in single-use while only needing between 0.5ms - 0.8ms for the response. There are two calls made to redis.

This is how the pool is initiated:

func createPool(server string) *redis.Pool {
return &redis.Pool{ 
    MaxIdle:     500,
    MaxActive:   10000,
    IdleTimeout: 5 * time.Second,
    //MaxConnLifetime: 1800 * time.Microsecond,

    Dial: func() (redis.Conn, error) {
        c, err := redis.Dial("tcp", server)
        if err != nil {
            return nil, err
        }
        return c, err
    },

    TestOnBorrow: func(c redis.Conn, t time.Time) error {
        if time.Since(t) < (3 * time.Second) {
            return nil
        }
        _, err := c.Do("PING")
        if err != nil {
        }
        return err
    },
}

And this is the only place where the pool is used:

func GetValue(params Params) []int64 {
timeNow := time.Now()
conn := data.Pool.Get()
value1 := Foo(conn)
value2 := Bar(value1 , conn)
conn.Close()
defer Log(value1, value2)

return value2}

So basically at the start I get a connection from the pool, use is for the two redis-requests and then close it. I previously used defer conn.Close() as it is stated in the documentation and it didn't work either. vm.overcommit_memory=1 and net.core.somaxconn=512 were set on the server.

In single use of the API there is no problem. When under stress, like 4000 requests per second, it works for the first like 10s and then gets very slow and doesn't manage to answer in time (the 3ms stated at start).

When I check ActiveCount and IdleCount the values are between 2 and 5 and are always the same. Should`t there be a higher amount of connections possible with a MaxActive-value of 10.000? Or am I missing some crucial settings?

whatever.idc
  • 92
  • 1
  • 1
  • 10
  • How many concurrent connection is used by the test application during the test? Are you using keep alive connection between test application and tested application during the test? How the CPU and memory behaves during the test? What does `defer Log(value1, value2)` do? – Jaroslaw Jul 26 '21 at 21:25
  • The Active/Idle-Count goes only up to like 12. I tested it with and without keep alive connections - nothing changed. CPU and Memory of the server are under 30% so i dont see a problem there. That's just a function logging the content of the request. Is it right, that the server side doesn't close the connection but its stored in the response to the client? And when the client isn't properly closing the connection the server would still wait? – whatever.idc Jul 27 '21 at 13:36
  • The symptom you describe indicate that some type of resource is running out after a period of time. In such tests it is not uncommon to see problems with huge number of connections in TIME_WAIT state because of not using keep-alive (check it with `netstat`). It may also be a problem with logging if the logs are sent to some service. Did you tried tests without `Log` function. I don't understand the question about storing connection in response to the client. – Jaroslaw Jul 27 '21 at 16:21
  • You could also use [pprof](https://pkg.go.dev/net/http/pprof) to look at your goroutines when you observe this drop in processed requests. – Jaroslaw Jul 27 '21 at 16:33
  • Thank you for your help. The problem was not redis-dependent, but your suggestions helped in resolving the issue! – whatever.idc Jul 30 '21 at 08:36

1 Answers1

0

The whole problem was not redis-dependent. The sockets of the port listening were flooded, since the TCP-connections weren't closed properly while stress-testing.

That resulted in around 60k connections in time_wait-state. The problem was resolved when using live-traffic for the stress-test instead of jMeter.

whatever.idc
  • 92
  • 1
  • 1
  • 10