0

This is a Kafka batch process. I want to read a local CSV file and write it into a Kafka topic.

Then consumer has to get data from the topic they subscribed.

Expected: I want the consumed data to be appended to a file in Parquet format in HDFS. Please help me to achieve this in an efficient manner.

Kafka Producer input: enter image description here

Kafka Consumer output: enter image description here

I want the value to be appended to a file in HDFS.

mazaneicha
  • 8,794
  • 4
  • 33
  • 52
Sudhakar
  • 45
  • 6

1 Answers1

0

Doing that from scratch would be quite complicated.

You can use the Kafka connect sink HDFS connector that handle out of the box parquet format output ( this would need a bit pre processing your records though, to put them in a correct format like json with schema etc..).

More info here : https://docs.confluent.io/current/connect/kafka-connect-hdfs/index.html

Yannick
  • 1,240
  • 2
  • 13
  • 25