0
  • Spark version: 2.4.5
  • Component: Spark Streaming
  • Class: DirectKafkaInputDStream

In class DirectKafkaInputDStream, I am a little confused about that why should invoke paraniodPoll before seekToEnd?

protected def latestOffsets(): Map[TopicPartition, Long] = {
    val c = consumer
    paranoidPoll(c)
    val parts = c.assignment().asScala

    // make sure new partitions are reflected in currentOffsets
    val newPartitions = parts.diff(currentOffsets.keySet)

    // Check if there's any partition been revoked because of consumer rebalance.
    val revokedPartitions = currentOffsets.keySet.diff(parts)
    if (revokedPartitions.nonEmpty) {
      throw new IllegalStateException(s"Previously tracked partitions " +
        s"${revokedPartitions.mkString("[", ",", "]")} been revoked by Kafka because of consumer " +
        s"rebalance. This is mostly due to another stream with same group id joined, " +
        s"please check if there're different streaming application misconfigure to use same " +
        s"group id. Fundamentally different stream should use different group id")
    }

    // position for new partitions determined by auto.offset.reset if no commit
    currentOffsets = currentOffsets ++ newPartitions.map(tp => tp -> c.position(tp)).toMap

    // find latest available offsets
    c.seekToEnd(currentOffsets.keySet.asJava)
    parts.map(tp => tp -> c.position(tp)).toMap
  }
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Huzhenyu
  • 97
  • 2
  • 8

1 Answers1

1

Without poll(0), assignment() may return empty set, poll ensures client connects to the Kafka coordinator node.

But poll(0) has been deprecated by kafka-client, check alternate API from Spark.

Also see: KafkaConsumer assignment() returns empty

asolanki
  • 1,333
  • 11
  • 18
QingSun
  • 26
  • 3