I have written a process where I'm generating messages thru custom flume source and Flume Kafka sink provided by Hortonworks to write into Kafka brokers.
During this process i have noticed that if KAFKA broker is already running and then i start my Flume agent it delivers each and every message to the Kafka broker properly but when i starts Kafka broker when Flume agent is already running, KAFKA broker is not able to receive all the messages.
When i run Kafka Console consumer to check the counts of messages received i noticed it is dropping few records from beginning and few records from the end.
I have tried multiple mix and match in Flume.conf but still it is working as expected.
Below are the configuration parameter which i have provided to Flume.conf -
agent.channels = firehose-channel
agent.sources = stress-source
agent.sinks = kafkasink
#################################
# Benchmark Souce Configuration #
#################################
agent.sources.stress-source.type=com.kohls.flume.source.stress.BenchMarkTestScenriao
agent.sources.stress-source.size=5000
agent.sources.stress-source.maxTotalEvents=30000
agent.sources.stress-source.batchSize=200
agent.sources.stress-source.throughputThreshold=4000
agent.sources.stress-source.throughputControlSeconds=1
agent.sources.stress-source.channels=firehose-channel
#################################
# Firehose Channel Configuration #
#################################
agent.channels.firehose-channel.type = file
agent.channels.firehose-channel.checkpointDir = /data/flume/checkpoint
agent.channels.firehose-channel.dataDirs = /data/flume/data
agent.channels.firehose-channel.capacity = 10000
agent.channels.firehose-channel.transactionCapacity = 10000
agent.channels.firehose-channel.useDualCheckpoints=1
agent.channels.firehose-channel.backupCheckpointDir=/data/flume/backup
############################################
# Firehose Sink Configuration - Kafka Sink #
############################################
agent.sinks.kafkasink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkasink.topic = backoff_test_17
agent.sinks.kafkasink.channel=firehose-channel
agent.sinks.kafkasink.brokerList = sandbox.hortonworks.com:6667
agent.sinks.kafkasink.batchsize = 200
agent.sinks.kafkasink.requiredAcks = 1
agent.sinks.kafkasink.kafka.producer.type = async
agent.sinks.kafkasink.kafka.batch.num.messages = 200
I have also tried to analyses the flume log and noticed that the flume metrics are properly showing the PUT and TAKE count.
Please let me know if anyone has any pointer to solve this issue.
Appreciating your help in advance.