so I'm running a spark streaming program and reading a stream from a Kafka topic with the following options for the readStream method.
options = {
...
...
"kafka.security.protocol": "SASL_SSL",
"startingOffsets": "earliest",
"maxOffsetsPerTrigger": records_per_trigger,
"subscribe": topic
"auto.offset.reset": "earliest"
}
I have "startingOffsets" and "auto.offset.reset" set to earliest but as per the log, it keeps resetting the offset to 142634681. However, it is clearly still consuming messages from the Kafka topic based on the output I am getting so is the log inaccurate? or am I understanding how offsets work wrong?
Resetting offset for partition edgebook_db_ca_on_history.journal_entries-0 to offset 142634681.
23/02/28 20:06:52 INFO org.apache.kafka.clients.Metadata: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Cluster ID: lkc-09j5mp
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Discovered group coordinator b14-pkc-3w22w.us-central1.gcp.confluent.cloud:9092 (id: 2147483633 rack: null)
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] (Re-)joining group
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group.
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] (Re-)joining group
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Finished assignment for group at generation 1: {consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1-1f851ea3-4e44-4a9e-b0e3-a5a75d9cbf3d=Assignment(partitions=[edgebook_db_ca_on_history.journal_entries-0])}
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Successfully joined group with generation 1
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Notifying assignor about the new Assignment(partitions=[edgebook_db_ca_on_history.journal_entries-0])
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Adding newly assigned partitions: edgebook_db_ca_on_history.journal_entries-0
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Found no committed offset for partition edgebook_db_ca_on_history.journal_entries-0
23/02/28 20:06:52 INFO org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Seeking to EARLIEST offset of partition edgebook_db_ca_on_history.journal_entries-0
23/02/28 20:06:53 INFO org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Resetting offset for partition edgebook_db_ca_on_history.journal_entries-0 to offset 0.
23/02/28 20:06:54 INFO org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Seeking to LATEST offset of partition edgebook_db_ca_on_history.journal_entries-0
23/02/28 20:06:54 INFO org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0-1, groupId=spark-kafka-source-379c9446-a679-47c1-bad8-fbf11b3db3e0--1283692680-driver-0] Resetting offset for partition edgebook_db_ca_on_history.journal_entries-0 to offset 142634681.