0

I'm running a Flink job which takes msgs from Kafka and eventually writes them into a Cassandra sink.

I'm ingesting around 500 msgs/s, which are flat mapped into ~60,000 Cassandra inserts. The job parallelism is 5 (reading from 5 Kafka partitions). When ingestion starts, I see that the job successfully writes all msgs and the Kafka consumer is not lagging behind.
After a minute or so, I suddenly start to see a drop in the Kafka consumption rate, an increase in the Kafka records lag and in the avg fetch time of the Kafka consumer.
Looking at the Flink UI, I see that the sink operator (CassandraPojoSink) is the one that causes the backpressure, however, Cassandra is not exhausted in CPU and memory and the write latency to it is stable and low. Adding more parallelism (5 -> 20) to the sink operator helps a bit but doesn't solve the issue.

Can anyone point me in the right direction on how to tackle this kind of issue? adding more and more parallelism seems like a bad solution (or is it?).

Thanks!

yaarix
  • 490
  • 7
  • 18
  • Which Flink version are you using ? Is write-ahead log enabled ? Did you try to set `maxConcurrentRequests` ? Maybe the following link can help you: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/cassandra.html#configurations – Ricardo Alvaro Lohmann Mar 03 '20 at 21:07
  • 1.9, write ahead is not enabled. – yaarix Mar 04 '20 at 05:14

0 Answers0