We are trying to migrate Kafka KSQL to our system and would like to share some problems that we could not solve during the process. We have 3 Kafka nodes in our cluster, each of the servers has:
8 CORE
50G+ RAM
100G ssd
On each server we have zookeeper to manage the cluster. All the OS limits are increased so the nodes could use more resources than it needs:
Xmx: 10G
Xms: 10G
nofiles: 500000
For now, the traffic to the cluster from producer is minor (~ 10 messages per second). Right now we have only one producer and the message format is:
{"user_id": <id|INT>, "action_id": <id|INT>, "amount": <amount|FLOAT>}
Topic in Kafka is divided onto 6 partitions with 1 replication:
Topic:<some_topic> PartitionCount:6 ReplicationFactor:1 Configs:
Topic: <some_topic> Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: <some_topic> Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: <some_topic> Partition: 2 Leader: 2 Replicas: 2 Isr: 2
Topic: <some_topic> Partition: 3 Leader: 0 Replicas: 0 Isr: 0
Topic: <some_topic> Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: <some_topic> Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Now, of course, the nodes are underutilized and in kafka side everything is more than ok )
We would like to use KSQL on top of Kafka to be able to filter data coming to our system with SQL. Here are KSQL server resources:
32 CORE
100G+ RAM
50G+ ssd
We have only one table:
Field | Type
-------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
ACTION_ID | INTEGER
USER_ID | INTEGER
AMOUNT | DOUBLE
Here is the command the table was created with:
create table <some_table> (action_id INT, user_Id INT, amount DOUBLE) with (KAFKA_TOPIC='<some_topic>', VALUE_FORMAT='JSON', KEY = 'user_id');
In our application, we need to subscribe to the table by user_id, like this:
SELECT * FROM <some_table> WHERE USER_ID=<some_user_id>;
For the production KSQL server configuration, we use the official recommendation from confluent: https://docs.confluent.io/current/ksql/docs/installation/server-config/config-reference.html#recommended-ksql-production-settings
The OS and software limits are also increased for the KSQL server:
Xmx: 10G (we have tried till 50G)
Xms: 10G (we have tried till 50G)
nofiles: 500000
In case we use only one subscription, we don't get any issues (everything is fine in this case).
But we need more than 200000+ subscriptions overall. So when we try to get 100-200 parallel subscriptions, we are getting "read timeouts" in our client. In the server, we do not see any abnormal load that could affect KSQL.
We suppose that the issue is related only with KSQL because when we try to use another KSQL server(in a different machine), at the same time we can see that the second server is working fine and can handle some 1-20 subscriptions.
I could not find any benchmark on internet connected with KSQL server, and in the documentation, as well I could not find any mention of the use cases of the KSQL, maybe it's designed only to serve few connections with huge data, or maybe our system is misconfigured so we should fixed it to use the software for our goals.
Any suggestion would be helpful.
Thanks in advance )