3

I try to implement a golang tcp server, and I found the concurrency is satisfied for me, but the CPU usage is too high(concurrency is 15W+/s, but the CPU usage is about 800% in a 24 cores linux machine). At the same time, a C++ tcp server is only about 200% usage with a similar concurrency(with libevent).

The following code is the demo of golang:

func main() {
    listen, err := net.Listen("tcp", "0.0.0.0:17379")
    if err != nil {
        fmt.Errorf(err.Error())
    }
    go acceptClient(listen)
    var channel2 = make(chan bool)
    <-channel2
}

func acceptClient(listen net.Listener) {
    for {
        sock, err := listen.Accept()
        if err != nil {
            fmt.Errorf(err.Error())
        }
        tcp := sock.(*net.TCPConn)
        tcp.SetNoDelay(true)
        var channel = make(chan bool, 10)
        go read(channel, sock.(*net.TCPConn))
        go write(channel, sock.(*net.TCPConn))
    }
}

func read(channel chan bool, sock *net.TCPConn) {
    count := 0
    for {
        var buf = make([]byte, 1024)
        n, err := sock.Read(buf)
        if err != nil {
            close(channel)
            sock.CloseRead()
            return
        }
        count += n
        x := count / 58
        count = count % 58
        for i := 0; i < x; i++ {
            channel <- true
        }
   }
}

func write(channel chan bool, sock *net.TCPConn) {
    buf := []byte("+OK\r\n")
    defer func() {
        sock.CloseWrite()
        recover()
    }()
    for {
        _, ok := <-channel
        if !ok {
            return
        }
        _, writeError := sock.Write(buf)
        if writeError != nil {
            return
        }
    }
}

And I test this tcp server by the redis-benchmark with multi-clients:

redis-benchmark -h 10.100.45.2  -p 17379 -n 1000 -q script load "redis.call('set','aaa','aaa')"

I also analyzed my golang code by the pprof, it is said CPU cost a lot of time on syscall: enter image description here

E.SHEN
  • 39
  • 1
  • 5
  • 1
    This is probably better suited on code review. Some thoughts: you are using way more channels than needed, and you missed a check on the socket type assertion – Passer By Dec 15 '17 at 07:55
  • im useless to this (i can only suggest to add a sleep....), but interested. Do you mind posting on the ml https://groups.google.com/forum/#!forum/golang-nuts –  Dec 15 '17 at 10:37
  • Most of the time being spent in syscall for io is what you would expect. You're making a lot of garbage, perhaps the GC is taking up the extra time. Note that total CPU usage would be expected to be higher in the Go program, because the Go program is probably just doing more, but you should be able to get throughput fairly close. – JimB Dec 15 '17 at 13:50
  • The [Go toolchain includes a strong profiler](https://blog.golang.org/profiling-go-programs). – Adrian Dec 15 '17 at 14:52
  • @PasserBy yes, I created a channel for every socket connection, I did this to improve the socket connection concurrency, and this demo is just for testing golang socket performance, in this case I didn't do the socket type assertion. Do you think too many channels will case the high cpu usage? – E.SHEN Dec 18 '17 at 03:12
  • @mh-cbon ah, that's a good idea, I will post this one later. – E.SHEN Dec 18 '17 at 03:13
  • @JimB This is what I confused, most of the time cost on the syscall, but the C++ program will also call so many syscall (maybe), what cause the big difference? The GC maybe a problem, do you have any idea to reduce the GC frequency? – E.SHEN Dec 18 '17 at 03:26
  • @E.SHEN, the garbage collector runs after a certain amount of allocations when there's garbage to collect. You're allocating a new 1kb buffer for every read. – JimB Dec 18 '17 at 13:44
  • Aren't you just benchmarking the benchmark tool? If the benchmark tool was any good I don't see why the server shouldn't eat all 24 CPUs rather than just 8. The fact that you only have 800% CPU usage means that redis-benchmark can't keep up sending requests fast enough to saturate all your CPUs. And the fact that the C++ server only eats 200% means that either it's not multi-threaded or it's way too fast for the benchmark tool. A good server should eat all available CPUs, if it doesn't that means that there is some resource contention going on or it's badly written. – Art Jan 10 '18 at 09:10
  • @Art thanks for your reply, you are right at some points. And yes, this golang tcp server can eat all 24 CPUs, but the benchmark tool has its own limitation, he can only post 15W+ request per second. both C++ server and golang server are too strong for the benchmark. My question is when C++ server and golang server handle the same number of requst, why does the golang server cost much more CPU resurce than a C++ server? – E.SHEN Jan 11 '18 at 08:34
  • @E.SHEN Well, you said that the C++ server is written with libevent, generally that is used for things that aren't multithreaded so it surprises me that it uses more than one cpu to be honest. Also, what does "15W+" exactly mean? – Art Jan 11 '18 at 08:55
  • @Art After the server read the message from client, I handled some logic in the other thread and send back a message to the client. In this case the usage is more than one cpu. And the "15W+" means "150,000+", my bad :) – E.SHEN Jan 11 '18 at 09:26

2 Answers2

1

I don't think parallelise the read and write with channel will provide you better performance in this case. You should try to do less memory allocation and less syscall (The write function may do a lot of syscalls)

Can you try this version?

package main

import (
    "bytes"
    "fmt"
    "net"
)

func main() {
    listen, err := net.Listen("tcp", "0.0.0.0:17379")
    if err != nil {
        fmt.Errorf(err.Error())
    }
    acceptClient(listen)
}

func acceptClient(listen net.Listener) {
    for {
        sock, err := listen.Accept()
        if err != nil {
            fmt.Errorf(err.Error())
        }
        tcp := sock.(*net.TCPConn)
        tcp.SetNoDelay(true)
        go handleConn(tcp) // less go routine creation but no concurrent read/write on the same conn
    }
}

var respPattern = []byte("+OK\r\n")

// just one goroutine per conn
func handleConn(sock *net.TCPConn) {
    count := 0
    buf := make([]byte, 4098) // Do not create a new buffer each time & increase the buff size
    defer sock.Close()

    for {
        n, err := sock.Read(buf)
        if err != nil {
            return
        }
        count += n
        x := count / 58
        count = count % 58
        resp := bytes.Repeat(respPattern, x) // can be optimize
        _, writeError := sock.Write(resp) // do less syscall
        if writeError != nil {
            return
        }
    }
}
Gnukos
  • 115
  • 1
  • 6
0

Perhaps adding a sleep in the main buy loop... time.Sleep(10 * time.Millisecond)

func acceptClient(listen net.Listener) {
    for {
        sock, err := listen.Accept()
        if err != nil {
            fmt.Errorf(err.Error())
        }
        tcp := sock.(*net.TCPConn)
        tcp.SetNoDelay(true)
        var channel = make(chan bool, 10)
        go read(channel, sock.(*net.TCPConn))
        go write(channel, sock.(*net.TCPConn))
        time.Sleep(10 * time.Millisecond)
    }
}
Zibri
  • 9,096
  • 3
  • 52
  • 44