Where's the bottleneck when I wait for a Kafka message then return a value in Actix Web?

Question

I am trying to communicate between 2 microservices written in Rust and Node.js using Kafka.

I'm using actix-web as web framework and rdkafka as Kafka client for Rust. On the Node.js side, it queries stuff from the database and returns it as JSON to the Rust server via Kafka.

The flow:

Request -> Actix Web -> Kafka -> Node -> Kafka -> Actix Web -> Response

The logic is the request hits an endpoint on Actix Web, then creates a message to request something to another micro-service and waits until it sends back (verify by Kafka message key), and returns it to the user as an HTTP response.

I got it to work, but the performance is very slow (I am stress-testing with wrk).

I'm not sure why it's performing slow but as I was digging down, I found that if I add a delay on the Node.js side for 5 seconds and I create 2 requests to actix-web where the requests are different by a second, it will respond with a 5 and 10-second delay.

The benchmark is around 3k requests per second, using the following command:

wrk http://localhost:8080 -d 20s -t 2 -c 200

This makes me guess that something might be blocking the thread for each request.

Here is the source code and the repo:

use std::{
    sync::Arc,
    time::{ 
        Duration, 
        Instant
    }
};

use actix_web::{
    App, 
    HttpServer, 
    get, 
    rt,
    web::Data
};

use futures::TryStreamExt;
use tokio::time::sleep;

use num_cpus;
use rand::{
    distributions::Alphanumeric, 
    Rng
};

use rdkafka::{
    ClientConfig, 
    Message, 
    consumer::{
        Consumer, 
        StreamConsumer
    }, 
    producer::{
        FutureProducer, 
        FutureRecord
    }
};

const TOPIC: &'static str = "exp-queue_general-5";

#[derive(Clone)]
pub struct AppState {
    pub producer: Arc<FutureProducer>,
    pub receiver: flume::Receiver<String>
}

fn generate_key() -> String {
    rand::thread_rng()
        .sample_iter(&Alphanumeric)
        .take(8)
        .map(char::from)
        .collect()
}

#[get("/")]
async fn landing(state: Data<AppState>) -> String {
    let key = generate_key();
    let t1 = Instant::now();

    let producer = &state.producer;
    let receiver = &state.receiver;

    producer
        .send(
            FutureRecord::to(&format!("{}-forth", TOPIC))
                .key(&key)
                .payload("Hello From Rust"),
                Duration::from_secs(8)
        )
        .await
        .expect("Unable to send message");

    println!("Producer take {} ms", t1.elapsed().as_millis());
    
    let t2 = Instant::now();
    let value = receiver
        .recv()
        .unwrap_or("".to_owned());

    println!("Receiver take {} ms", t2.elapsed().as_millis());
    println!("Process take {} ms\n", t1.elapsed().as_millis());

    value
}

#[get("/status")]
async fn heartbeat() -> &'static str {
    // ? Concurrency delay check
    sleep(Duration::from_secs(1)).await;

    "Working"
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // ? Assume that the whole node is just Rust instance
    let mut cpus = num_cpus::get() / 2 - 1;

    if cpus < 1 {
        cpus = 1;
    }

    println!("Cpus {}", cpus);
    
    let producer: FutureProducer = ClientConfig::new()
        .set("bootstrap.servers", "localhost:9092")
        .set("linger.ms", "25")
        .set("queue.buffering.max.messages", "1000000")
        .set("queue.buffering.max.ms", "25")
        .set("compression.type", "lz4")
        .set("retries", "40000")
        .set("retries", "0")
        .set("message.timeout.ms", "8000")
        .create()
        .expect("Kafka config");

    let (tx, rx) = flume::unbounded::<String>();

    rt::spawn(async move {
        let consumer: StreamConsumer = ClientConfig::new()
            .set("bootstrap.servers", "localhost:9092")
            .set("group.id", &format!("{}-back", TOPIC))
            .set("queued.min.messages", "200000")
            .set("fetch.error.backoff.ms", "250")
            .set("socket.blocking.max.ms", "500")
            .create()
            .expect("Kafka config");

        consumer
            .subscribe(&vec![format!("{}-back", TOPIC).as_ref()])
            .expect("Can't subscribe");

        consumer
            .stream()
            .try_for_each_concurrent(
                cpus,
                |message| {
                    let txx = tx.clone();

                    async move {
                        let result = String::from_utf8_lossy(
                            message
                            .payload()
                            .unwrap_or("Error serializing".as_bytes())
                        ).to_string();

                        txx.send(result).expect("Tx not sending");

                        Ok(())
                    }

                }
            )
            .await
            .expect("Error reading stream");
    });

    let state = AppState {
        producer: Arc::new(producer),
        receiver: rx
    };

    HttpServer::new(move || {
        App::new()
            .app_data(Data::new(state.clone()))
            .service(landing)
            .service(heartbeat)
    })
    .workers(cpus)
    .bind("0.0.0.0:8080")?
    .run()
    .await
}

I found some solved issues on GitHub which recommended using actors instead which I also did as a separate branch.

This has worse performance than the main branch, performing around 200-300 requests per second.

I don't know where the bottleneck is or what's the thing that blocking the request.

You haven't included how you've built the Rust code, which **is very important**. — Shepmaster, Jun 30 '21 at 18:11
It's hard to answer your question because it doesn't include a [MRE]. We can't tell what crates (**and their versions**) are present in the code, or how you've built it. It would make it easier for us to help you if you try to reproduce your error on the [Rust Playground](https://play.rust-lang.org) if possible, otherwise in a brand new Cargo project, then [edit] your question to include the additional info. There are [Rust-specific MRE tips](//stackoverflow.com/tags/rust/info) you can use to reduce your original code for posting here. Thanks! — Shepmaster, Jun 30 '21 at 18:11
[To make Stack Overflow a useful resource for future visitors beyond the context of your repository](https://meta.stackoverflow.com/q/380194/155423), please [edit] your question to add a complete [MRE] in the question itself, in **addition** to the link to your repository. — Shepmaster, Jun 30 '21 at 18:11

Where's the bottleneck when I wait for a Kafka message then return a value in Actix Web?

0 Answers0