I'm facing an issue with my kafka consumer job written in scala. when we start the consumer, it fetches all messages available in the broker from the last consumed offset, process those JSON messages and writes them to Hive table. After writing, it has to again fetch new messages, but that's not happening instead we have to forcefully stop the job and restart it to fetch new messages.
Here are the steps we perform in the Kafka consumer job.
- set the params/properties
- start the stream and messages based on the offsets
- Parse the JSON messages for few house keeping columns
- check for the duplicates in current stream comparing with last 5 days of hive partitions
- write to Hive partitioned table
- create trigger files for the successor job to parse the JSON
- Repeat above process for new messages in stream
Here are some of the properties we set
kafka_props.put("bootstrap.servers", BROKER_SERVERS)
kafka_props.put("key.deserializer", classOf[StringDeserializer])
kafka_props.put("value.deserializer", classOf[StringDeserializer])
kafka_props.put("group.id", GROUP_ID)
kafka_props.put("auto.offset.reset", offset)
kafka_props.put("security.protocol", "SSL")
kafka_props.put("ssl.truststore.location", TRUSTSTORE_LOCATION)
kafka_props.put("ssl.truststore.password", SSL_TRUST_PWD)
kafka_props.put("ssl.keystore.location", KEYSTORE_LOCATION)
kafka_props.put("ssl.keystore.password", SSL_KEY_PWD)
kafka_props.put("enable.auto.commit", (false: java.lang.Boolean))
kafka_props.put("session.timeout.ms", (30000: java.lang.Integer))
kafka_props.put("heartbeat.interval.ms", (3000: java.lang.Integer))
we tried changing parameters but not much help. Not sure what is being missed and need help to fix it.