I don't want to use one consumer for all topics, I want to use this method to improve consumption efficiency
val kafkaParams = Map(
ConsumerConfig.GROUP_ID_CONFIG -> group,
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> deserialization,
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> deserialization
)
//1.1 create first consumer
val kafkaDS: InputDStream1[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, Set(topic1))
//1.2 create second consumer
val kafkaDS: InputDStream2[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, Set(topic2))
//1.3 create third consumer
val kafkaDS: InputDStream3[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, Set(topic3))
//1.4 create fourth consumer
val kafkaDS: InputDStream4[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, Set(topic4))
//2.1 then union all Dstream
val allStream = InputDStream1
.union(InputDStream2)
.union(InputDStream3)
.union(InputDStream4)
The program can run 5~6 batches normally, but then the program gets stuck, spark webUI streaming cannot be opened, kafka consumer group is rebalancing
, it seems that there is a problem with kafka offset submission,kafka consumer closed.
i refer to this Level of Parallelism in Data Receiving