1

I don't want to use one consumer for all topics, I want to use this method to improve consumption efficiency

val kafkaParams = Map(
      ConsumerConfig.GROUP_ID_CONFIG -> group,
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
      ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> deserialization,
      ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> deserialization
    )
//1.1 create first consumer
val kafkaDS: InputDStream1[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, Set(topic1))
//1.2 create second consumer
val kafkaDS: InputDStream2[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, Set(topic2))
//1.3 create third consumer
val kafkaDS: InputDStream3[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, Set(topic3))
//1.4 create fourth consumer
val kafkaDS: InputDStream4[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, Set(topic4))
//2.1 then union all Dstream
val allStream = InputDStream1
               .union(InputDStream2)
               .union(InputDStream3)
               .union(InputDStream4)

The program can run 5~6 batches normally, but then the program gets stuck, spark webUI streaming cannot be opened, kafka consumer group is rebalancing, it seems that there is a problem with kafka offset submission,kafka consumer closed. i refer to this Level of Parallelism in Data Receiving

1580923067
  • 29
  • 5

0 Answers0