I am trying to communicate between 2 microservices written in Rust and Node.js using Kafka.
I'm using actix-web as web framework and rdkafka as Kafka client for Rust. On the Node.js side, it queries stuff from the database and returns it as JSON to the Rust server via Kafka.
The flow:
Request -> Actix Web -> Kafka -> Node -> Kafka -> Actix Web -> Response
The logic is the request hits an endpoint on Actix Web, then creates a message to request something to another micro-service and waits until it sends back (verify by Kafka message key), and returns it to the user as an HTTP response.
I got it to work, but the performance is very slow (I am stress-testing with wrk
).
I'm not sure why it's performing slow but as I was digging down, I found that if I add a delay on the Node.js side for 5 seconds and I create 2 requests to actix-web where the requests are different by a second, it will respond with a 5 and 10-second delay.
The benchmark is around 3k requests per second, using the following command:
wrk http://localhost:8080 -d 20s -t 2 -c 200
This makes me guess that something might be blocking the thread for each request.
Here is the source code and the repo:
use std::{
sync::Arc,
time::{
Duration,
Instant
}
};
use actix_web::{
App,
HttpServer,
get,
rt,
web::Data
};
use futures::TryStreamExt;
use tokio::time::sleep;
use num_cpus;
use rand::{
distributions::Alphanumeric,
Rng
};
use rdkafka::{
ClientConfig,
Message,
consumer::{
Consumer,
StreamConsumer
},
producer::{
FutureProducer,
FutureRecord
}
};
const TOPIC: &'static str = "exp-queue_general-5";
#[derive(Clone)]
pub struct AppState {
pub producer: Arc<FutureProducer>,
pub receiver: flume::Receiver<String>
}
fn generate_key() -> String {
rand::thread_rng()
.sample_iter(&Alphanumeric)
.take(8)
.map(char::from)
.collect()
}
#[get("/")]
async fn landing(state: Data<AppState>) -> String {
let key = generate_key();
let t1 = Instant::now();
let producer = &state.producer;
let receiver = &state.receiver;
producer
.send(
FutureRecord::to(&format!("{}-forth", TOPIC))
.key(&key)
.payload("Hello From Rust"),
Duration::from_secs(8)
)
.await
.expect("Unable to send message");
println!("Producer take {} ms", t1.elapsed().as_millis());
let t2 = Instant::now();
let value = receiver
.recv()
.unwrap_or("".to_owned());
println!("Receiver take {} ms", t2.elapsed().as_millis());
println!("Process take {} ms\n", t1.elapsed().as_millis());
value
}
#[get("/status")]
async fn heartbeat() -> &'static str {
// ? Concurrency delay check
sleep(Duration::from_secs(1)).await;
"Working"
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
// ? Assume that the whole node is just Rust instance
let mut cpus = num_cpus::get() / 2 - 1;
if cpus < 1 {
cpus = 1;
}
println!("Cpus {}", cpus);
let producer: FutureProducer = ClientConfig::new()
.set("bootstrap.servers", "localhost:9092")
.set("linger.ms", "25")
.set("queue.buffering.max.messages", "1000000")
.set("queue.buffering.max.ms", "25")
.set("compression.type", "lz4")
.set("retries", "40000")
.set("retries", "0")
.set("message.timeout.ms", "8000")
.create()
.expect("Kafka config");
let (tx, rx) = flume::unbounded::<String>();
rt::spawn(async move {
let consumer: StreamConsumer = ClientConfig::new()
.set("bootstrap.servers", "localhost:9092")
.set("group.id", &format!("{}-back", TOPIC))
.set("queued.min.messages", "200000")
.set("fetch.error.backoff.ms", "250")
.set("socket.blocking.max.ms", "500")
.create()
.expect("Kafka config");
consumer
.subscribe(&vec![format!("{}-back", TOPIC).as_ref()])
.expect("Can't subscribe");
consumer
.stream()
.try_for_each_concurrent(
cpus,
|message| {
let txx = tx.clone();
async move {
let result = String::from_utf8_lossy(
message
.payload()
.unwrap_or("Error serializing".as_bytes())
).to_string();
txx.send(result).expect("Tx not sending");
Ok(())
}
}
)
.await
.expect("Error reading stream");
});
let state = AppState {
producer: Arc::new(producer),
receiver: rx
};
HttpServer::new(move || {
App::new()
.app_data(Data::new(state.clone()))
.service(landing)
.service(heartbeat)
})
.workers(cpus)
.bind("0.0.0.0:8080")?
.run()
.await
}
I found some solved issues on GitHub which recommended using actors instead which I also did as a separate branch.
This has worse performance than the main branch, performing around 200-300 requests per second.
I don't know where the bottleneck is or what's the thing that blocking the request.