If I am correct, by default spark streaming 1.6.1 uses a single thread to read data from each Kafka partition, let assume my Kafka topic partition is 50 and that means messages in each 50 partitions will be read sequentially or may in round robin fashion.
Case 1:
-If yes, then how do I parallelize read operation at partition level? Is creating multiple KafkaUtils.createDirectStream
is the only solution?
e.g.
val stream1 = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topicsSet).map(_._2)
val stream2 = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topicsSet).map(_._2)
Case 2:
-If my kafka partition is receiving 5 messages/sec then, how does "--conf spark.streaming.kafka.maxRatePerPartition=3"
and "--conf spark.streaming.blockInterval"
properties comes into picture in such scenario?