Flink read data from Hadoop and publish to Kafka

Question

I have a requirement to read data from HDFS and publish it to a Kafka topic. Because they are part of DataSet and DataStream APIs, is it possible to do what I'm looking for in a single job?

score 1 · Accepted Answer · answered Apr 29 '20 at 10:04

Flink's DataStream API can be used to read from HDFS files. See readfile() in https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#data-sources. Or you can use the file system connector with the Table and SQL APIs, but it only supports CSV.

Flink read data from Hadoop and publish to Kafka

1 Answers1