6

I am trying to load a data file in loop(to check stats) instead of standard input in Kafka. After downloading Kafka, I performed the following steps:

Started zookeeper:

bin/zookeeper-server-start.sh config/zookeeper.properties

Started Server:

bin/kafka-server-start.sh config/server.properties

Created a topic named "test":

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Ran the Producer:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
Test1
Test2

Listened by the Consumer:

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Test1
Test2

Instead of Standard input, I want to pass a data file to the Producer which can be seen directly by the Consumer. Or is there any kafka producer instead of console consumer using which I can read data files. Any help would really be appreciated. Thanks!

streetturtle
  • 5,472
  • 2
  • 25
  • 43
Sumit
  • 61
  • 1
  • 1
  • 3

7 Answers7

7

You can read data file via cat and pipeline it to kafka-console-producer.sh.

cat ${datafile} | ${kafka_home}/bin/kafka-console-producer.sh --broker-list ${brokerlist} --topic test 
Shawn Guo
  • 3,169
  • 3
  • 21
  • 28
  • 1
    Or, if you want to read the whole file and then continue tailing for subsequently appended lines, you'd use `tail -f -n +1 file_path`, instead of `cat`. – Marko Bonaci Feb 13 '16 at 18:29
  • 2
    Kafka has built-in File-source connector, which is made for such type of task: read a single file into producer for consumer to suck data. See my answer below. – WesternGun Feb 13 '18 at 08:46
4

If there is always a single file, you can just use tail command and then pipeline it to kafka console producer.

But if a new file will be created when some conditions met, you may need use apache.commons.io.monitor to monitor new file created, then repeat above.

Gang Sun
  • 65
  • 2
  • 7
3

Kafka has this built-in File Stream Connector, for piping the content of a file to producer(file source), or directing file content to another destination(file sink).

We have bin/connect-standalone.sh to read from file which can be configured in config/connect-file-source.properties and config/connect-standalone.properties.

So the command will be:

bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
WesternGun
  • 11,303
  • 6
  • 88
  • 157
  • Can you give the example of contents of `config/connect-file-source.properties` and `config/connect-standalone.properties` – awadhesh14 Apr 27 '19 at 09:58
  • Here is a more detailed explanation http://bigdatums.net/2017/06/20/writing-file-content-to-kafka-topic/ – awadhesh14 Apr 27 '19 at 10:03
2

The easiest way if you are using Linux or Mac is:

kafka-console-producer --broker-list localhost:9092 --topic test < messages.txt

Reference: https://github.com/Landoop/kafka-cheat-sheet

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
Jeff Z
  • 21
  • 2
  • I was trying this answer but it was giving error : no files found Then I tried to give the actual path like C:\data\messages.txt but the same error was there. Then I tried ..\ in path which means parent folder but there I got confused so i used tab there to see the files there. Hurrrrraaaayyyyyy ! it worked. It was not able to find the file because it was searching in the same location. i.e. i have given the path c:\data\message.txt it was trying to search c into the current locaiton. so i need to move it with parent folder commant which is ..\ – NickyPatel Sep 28 '20 at 12:26
0

You can probably try the kafkacat utility as well. The readme on Github provides examples

It would be great if you could share which tool worked the best for you :)

Details from KafkaCat Readme:

Read messages from stdin, produce to 'syslog' topic with snappy compression

$ tail -f /var/log/syslog | kafkacat -b mybroker -t syslog -z snappy
Edenhill
  • 2,897
  • 22
  • 35
Mehul
  • 1,106
  • 8
  • 12
0
kafka-console-produce.sh \
  --broker-list localhost:9092 \
  --topic my_topic \
  --new-producer < my_file.txt

Follow this link: http://grokbase.com/t/kafka/users/157b71babg/kafka-producer-input-file

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
Tanvi Garg
  • 308
  • 5
  • 18
0

Below command is ofcourse the easiest way to do that.

kafka-console-producer --broker-list localhost:9092 --topic test < message.txt

But sometimes it is not able to find the file. example :

C:\kafka_2.11-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic jason-input < C:\data\message.txt

you given the actual path but it is not able to find C at the current location so it will give the error : file not found. We would be thinking that we have given the actual path so it will go to root and it will start the path from there but it is finding the C(root) at the current place.

Solution for that is to give the ..\ into the path to move to the parent folder. for example. you are executing the command like

C:\kafka_2.11-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic jason-input < ..\..\..\data\message.txt

as of now i am into the windows folder. ..\ will move the current directory to bin folder and again ..\ will move the current directory to the kafka.... folder and again ..\ will move to the C:. so now my path starts. data and then message.txt

NickyPatel
  • 503
  • 4
  • 11