Why is fasthttp.Client slow when used concurrently to consume an API?

Question

it is first case.

var fastClient fasthttp.Client

fastClient = fasthttp.Client{
    ReadTimeout: 500 * time.Millisecond,
    MaxConnsPerHost: 500,
}

runtime.GOMAXPROCS(1)

for i := 0; i < 500; i++ {
    req := fasthttp.AcquireRequest()
    req.Header.SetMethod("GET")
    req.SetRequestURI(TEST_REST_API)
    res := fasthttp.AcquireResponse()
    
    start := time.Now()
    err := client.Do(req, res)
    ms := time.Since(start).Milliseconds()

    fasthttp.ReleaseRequest(req)
    fasthttp.ReleaseResponse(res)

    fmt.Printf("%v %v", ms, string(res.Header.Peek("x-real-service-time")))
}

and it is second case.

var fastClient fasthttp.Client

fastClient = fasthttp.Client{
    ReadTimeout: 500 * time.Millisecond,
    MaxConnsPerHost: 500,
}

runtime.GOMAXPROCS(1)

for i := 0; i < 500; i++ {
    go func(id int, c *fasthttp.Client) {
       req := c.AcquireRequest()
       req.Header.SetMethod("GET")
       req.SetRequestURI(TEST_REST_API)
       res := c.AcquireResponse()

       start := time.Now()
       err := client.Do(req, res)
       ms := time.Since(start).Milliseconds()

       c.ReleaseRequest(req)
       c.ReleaseResponse(res)

       fmt.Printf("%v %v %v", id, ms, string(res.Header.Peek("x-real-service-time")))
    }(i, &fastClient)

    // waiting for end all goroutine
}

x-real-service-time is a header that records the time it took for the server to process the actual request and respond.

in first case, ms is almost same with x-real-service-time. in second case, ms is too slower than x-real-service-time, and it gets slower and slower with each id.

Can you explain the reason for these differences? And I hope the difference between ms and x-real-service-time is small in the second case as well. How should I tune it?

EDIT: result for first case.

id: 166 | ms: 115 | x-real-service-time : 105
id: 167 | ms: 103 | x-real-service-time : 97
id: 168 | ms: 89 | x-real-service-time : 73
id: 169 | ms: 92 | x-real-service-time : 76
id: 170 | ms: 79 | x-real-service-time : 73
id: 171 | ms: 81 | x-real-service-time : 73
id: 172 | ms: 84 | x-real-service-time : 76
id: 173 | ms: 84 | x-real-service-time : 78
id: 174 | ms: 81 | x-real-service-time : 73
id: 175 | ms: 82 | x-real-service-time : 76
id: 176 | ms: 82 | x-real-service-time : 75
id: 177 | ms: 80 | x-real-service-time : 74
id: 178 | ms: 79 | x-real-service-time : 71
id: 179 | ms: 75 | x-real-service-time : 69
id: 180 | ms: 87 | x-real-service-time : 78
id: 181 | ms: 81 | x-real-service-time : 72
...

result for second case.

id: 486 | ms: 516 | x-real-service-time : 72
id: 361 | ms: 620 | x-real-service-time : 100
id: 349 | ms: 620 | x-real-service-time : 96
id: 417 | ms: 621 | x-real-service-time : 100
id: 3 | ms: 631 | x-real-service-time : 76
id: 227 | ms: 634 | x-real-service-time : 90
id: 152 | ms: 639 | x-real-service-time : 77
id: 138 | ms: 640 | x-real-service-time : 78
id: 139 | ms: 640 | x-real-service-time : 84
id: 144 | ms: 640 | x-real-service-time : 79
...
id: 267 | ms: 783 | x-real-service-time : 75
id: 195 | ms: 779 | x-real-service-time : 73
id: 465 | ms: 781 | x-real-service-time : 76
id: 420 | ms: 781 | x-real-service-time : 74
id: 334 | ms: 782 | x-real-service-time : 112
id: 14 | ms: 780 | x-real-service-time : 69
id: 430 | ms: 781 | x-real-service-time : 75
id: 372 | ms: 782 | x-real-service-time : 78
id: 224 | ms: 778 | x-real-service-time : 107
...
id: 481 | ms: 825 | x-real-service-time : 67
id: 105 | ms: 823 | x-real-service-time : 75
id: 467 | ms: 825 | x-real-service-time : 71
id: 67 | ms: 825 | x-real-service-time : 73
id: 460 | ms: 828 | x-real-service-time : 77
id: 142 | ms: 827 | x-real-service-time : 78
id: 58 | ms: 828 | x-real-service-time : 71
id: 311 | ms: 833 | x-real-service-time : 78
id: 414 | ms: 833 | x-real-service-time : 71
id: 289 | ms: 836 | x-real-service-time : 77
...
id: 323 | ms: 1019 | x-real-service-time : 201
id: 211 | ms: 1015 | x-real-service-time : 209
id: 220 | ms: 1015 | x-real-service-time : 203
id: 275 | ms: 1021 | x-real-service-time : 216
id: 477 | ms: 1019 | x-real-service-time : 207
id: 160 | ms: 1017 | x-real-service-time : 197
id: 398 | ms: 1020 | x-real-service-time : 199
id: 253 | ms: 1017 | x-real-service-time : 201
id: 74 | ms: 1019 | x-real-service-time : 204
id: 422 | ms: 1021 | x-real-service-time : 205

id : loop number(variable i)

ms : The time it takes for a client to receive a response after making a request (millisecond)

x-real-service-time: Time spent actually processing on the server (millisecond)

What service are you targeting? (I'm assuming that it is returning the `x-real-service-time` header?). Many services are going to respond to individual requests more slowly when you hit them with circa 500 simultaneous requests. Generally the total time elapsed, for the 500 queries, will be shorter (because the requests are run in parallel). — Brits, Jan 11 '23 at 04:02
@Brits The text content has been changed. The server side takes the same processing time no matter how many requests come, but I was curious that the client side is slower than that time. — SungHo Kim, Jan 11 '23 at 04:08
@Brits When requesting TEST_REST_API using fasthttp in a system using goroutine, regardless of the processing time of the server, I suspected that system performance could be deteriorated by fasthttp.client. — SungHo Kim, Jan 11 '23 at 04:15
Sorry - I'm confused. If "x-real-service-time" is being calculated on the server then the fact that it is changing demonstrates that the processing time on the server is changing? If I've misunderstood and it's being calculated on the client then other things come into play (e.g. network load will be higher with multiple simultaneous requests, rate limiting may be applied etc). Please provide more info on the server, and perhaps test it with [something else](https://jmeter.apache.org/) to provide a baseline. — Brits, Jan 11 '23 at 04:41
So, in the first case you have 500 requests in sequence taking about 80ms each, resulting in 40,000ms = 40s total duration. In the second case, you run 500 requests in parallel, and the longest ones take 1020ms. But since they all run in parallel, your total time is also 1020ms. Is that correct? So in the second case, your program finishes about 40 times faster than in the first case? — Erwin Bolwidt, Jan 11 '23 at 06:27
@ErwinBolwidt You are right. And I wonder why the longest task is getting 1020ms. Since there is a connection pool, I thought that io tasks would also work in parallel. It's not HTTP, but 500 sql queries won't take this long on the client. (I'm not very confident in this opinion.) Even considering that HTTP is a slow protocol, it seems too slow. — SungHo Kim, Jan 11 '23 at 07:36
The http service itself is getting slower from the amount of concurrent requests: instead of 75ms it's taking 200ms for some of the requests. Now you're potentially also causing a lot more traffic on your network (can't see that), so perhaps you're saturating your network link. And context switches aren't free either, running 500 concurrent goroutines on a single core is going to have some overhead too. — Erwin Bolwidt, Jan 11 '23 at 08:05
Additionally, the fasthttp client may have coarse concurrency controls (don't know), so the goroutines may be waiting for locks, channels, etc. If you're using http/2, you'll get multiple requests multiplexed over a single http/2 connection, but potentially at a latency cost too. — Erwin Bolwidt, Jan 11 '23 at 08:08
All goroutines will reach the time.Now() call relatively quickly. Then they all compete for network I/O and server resources. Each goroutine can be blocked indefinitely between the time.Now and time.Since calls, especially with GOMAXPROCS=1. This shouldn't have anything to do with fasthttp. I'd expect the same thing to happen with the standard http.Client. — Peter, Jan 11 '23 at 09:17

Para · Answer 1 · 2023-01-11T05:55:24.787

-2

you only use one cpu(runtime.GOMAXPROCS(1)),

if you use goroutine, cpu will waiting and hang,

client.Do cost most time.

there are not slow but start time too early.

this is why result for second case. id⬆ -> time⬆

edited Jan 11 '23 at 05:55

answered Jan 11 '23 at 05:50

Para

1,299
4
11

yes, you have 500 goroutine, but you only have GOMAXPROCS(6), if you craete a goroutine-pool with size 6 and GOMAXPROCS(6), it will be equal – Para Jan 11 '23 at 06:01
From your explanation, I think it's because IO operations work sequentially, right? I thought that using goroutine would work like asyncio, but am I misunderstanding? – SungHo Kim Jan 11 '23 at 06:07
I expected that async processing for io is possible because fasthttp.client has a connection pool, isn't it? – SungHo Kim Jan 11 '23 at 06:13
In most languages, multithreading is executed NIO. python *createtrhead* is NIO also. python asyncio is default is AIO(Linux kenel not support it), but you can use `asyncio.gather()` change to NIO... if you want write a code like asyncio's AIO , connection pool can't provide it, event it is smilar as AIO if goroution is equal to cpu-cores, it still is NIO.you must wirte a AIO youself – Para Jan 11 '23 at 06:43
Does NIO mean non blocking io? I thought having a connection pool was synonymous with being able to use multiple sockets. Since there are several doors from the client to the server, I expected them to be processed concurrently. What you're saying isn't this, it seems like you're saying that all goroutines share one IO resource after all. Therefore, no matter how many parallel jobs are processed, only one request goes to the server at a time because there is only one door. did i get it right? – SungHo Kim Jan 11 '23 at 06:56
Python uses GIL to implement aio, so there is no way to take advantage of multi-core performance... golang don't implement aio, but you can use chanel or muxte.lock impl it – Para Jan 11 '23 at 07:13
@Para - are you sure about this? From [the docs](https://pkg.go.dev/runtime) "The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is **no limit to the number of threads that can be blocked in system calls** on behalf of Go code; those do not count against the GOMAXPROCS limit." – Brits Jan 11 '23 at 07:19

Why is fasthttp.Client slow when used concurrently to consume an API?

1 Answers1