I am currently experimenting with ZMQ as a possible message broker for IPCs.
Version -> ZMQv4
I am using pebbe ZMQ , a go library over zmq's C library and performing tests.
I am rate testing it with a message rate of 1500 messages/sec and 10000 messages/sec.
I am using an XPUB-XSUB architecture with a publisher and subscriber connected to the proxy.
I understand the go library is just a wrapper hence for actual sending and receiving of messages c-go calls are involved.
I am experimenting this on a device with arm architecture. I see almost 40-50% CPU usage (100% is ~1GB with ~900MB on RAM and ~100MB on swap memory) for message passing on the proxy itself @rate of 1500 messages/sec.
I am not sure if this is okay or high usage. Not really sure what to use as a yard stick here.
I ran a profiling test and I see that runtime cgocall and runtime _ExternalCode is taking most of the CPU. Have attached the image of the profile graph (not sure how to upload an SVG).
I am trying to understand or reduce the CPU usage.
Based on this profile I don't think there is much I can do.
Is there a way to reduce CPU due to C Go calls and the external code block shown in the profile.
Not really done profiling before, so may not know many things.
Code to reproduce example:
ZMQ Broker
import (
"fmt"
zmq "github.com/pebbe/zmq4"
"github.com/pkg/profile"
)
func main() {
defer profile.Start(profile.CPUProfile, profile.ProfilePath(".")).Stop()
fmt.Println("Setting up XSUB socket")
subscriberSocket, err := zmq.NewSocket(zmq.XSUB)
if err != nil {
fmt.Println("Error when creating XSUB socket -> ", err)
}
defer subscriberSocket.Close()
err = subscriberSocket.Bind("tcp://127.0.0.1:8101")
if err != nil {
fmt.Println("Error when binding XSUB socket -> ", err)
} else {
fmt.Println("Succesfully accepting incoming connections on XSUB socket")
}
fmt.Println("Setting up XPUB socket")
publisherSocket, err := zmq.NewSocket(zmq.XPUB)
if err != nil {
fmt.Println("Error when creating XPUB socket -> ", err)
}
defer publisherSocket.Close()
err = publisherSocket.Bind("tcp://127.0.0.1:8100")
if err != nil {
fmt.Println("Error when binding XPUB socket -> ", err)
} else {
fmt.Println("Succesfully accepting incoming connections on XPUB socket")
}
err = zmq.Proxy(publisherSocket, subscriberSocket, nil)
if err != nil {
fmt.Println("Failed to start the XPUB XSUB broker -> ", err)
}
}
Publisher:
import (
"time"
zmq "github.com/pebbe/zmq4"
"fmt"
)
func main() {
publisher, err := zmq.NewSocket(zmq.PUB)
if err != nil {
fmt.Println("error when connecting to a pub socket -> ", err)
}
defer publisher.Close()
err = publisher.Connect("tcp://127.0.0.1:8101")
if err != nil {
fmt.Println("error when connecting to a pub socket -> ", err)
}
for range time.Tick(time.Microsecond * 500) {
sendToAll(publisher)
}
}
func sendToAll(pub *zmq.Socket) {
var message = "topicA test"
_, err := pub.Send(message, zmq.DONTWAIT)
if err != nil {
println("error when sending message-> ", err)
}
}
Subscriber
import (
"os"
"strconv"
zmq "github.com/pebbe/zmq4"
"fmt"
)
func main() {
// Socket to talk to server
fmt.Println("Collecting updates from broker...")
subscriber, err := zmq.NewSocket(zmq.SUB)
if err != nil {
fmt.Println("error when opening new socket to SUB -> ", err)
}
defer subscriber.Close()
err = subscriber.Connect("tcp://127.0.0.1:8100")
if err != nil {
fmt.Println("error when connecting to XSUB port -> ", err)
}
err = subscriber.SetSubscribe("topicA ")
if err != nil {
fmt.Println("error when setting subscription filter -> ", err)
}
i := 0
for {
msg, err := subscriber.Recv(0)
if err != nil {
fmt.Println("error when reciveing subscription info -> ", err)
os.Exit(1)
}
i += 1
fmt.Println(msg + "\n -> count is" + strconv.Itoa(i))
}
}
The parameters I am using to cross-compile is:
Not sure if how this is configured matters for performance
#!/bin/bash
ARM_PREFIX="arm-linux-androideabi-"
TOOLCHAIN_PATH="/home/NDK/arm"
CC="${TOOLCHAIN_PATH}/bin/${ARM_PREFIX}gcc" \
CFLAGS="-march=armv7-a -mfpu=neon" \
GOOS=android \
GOARCH=arm \
GOARM=7 \
CGO_ENABLED=1 \
PKG_CONFIG_PATH="${TOOLCHAIN_PATH}/lib/pkgconfig" \
go build -o test
Possible Proxy Code(What I think should be there) even with this the CPU is similar so I am assuming this is almost same as what the library implements. Not sure.
for {
// this will block forever till an event occurs
sockets, err := poller.Poll(-1)
if err != nil {
fmt.Println("error when establishing xpubsub poller")
}
for _, socket := range sockets {
switch s := socket.Socket; s {
case publisherSocket:
msg, err := s.Recv(0)
if err != nil {
fmt.Println("error when recieving on publisherSocket")
}
subscriberSocket.Send(msg, zmq.DONTWAIT)
case subscriberSocket:
msg, err := s.Recv(0)
if err != nil {
fmt.Println("error when recieving on subscribersocket")
}
publisherSocket.Send(msg, zmq.DONTWAIT)
}
}
}
Questions:
- Is there a way to reduce CPU due to C Go calls and the external code?
- I expected not this high CPU utilisation as the proxy is a blocking poller, that blocks until an event is triggered. Possible I am wrong here as microsecond does seem like a very high frequency , not sure if this is high for the CPU.
- Appreciate pointers and things to consider when doing performance tuning of this kind.