I have a requirement to read data from HDFS and publish it to a Kafka topic. Because they are part of DataSet and DataStream APIs, is it possible to do what I'm looking for in a single job?
Asked
Active
Viewed 516 times
1 Answers
1
Flink's DataStream API can be used to read from HDFS files. See readfile()
in https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#data-sources. Or you can use the file system connector with the Table and SQL APIs, but it only supports CSV.

David Anderson
- 39,434
- 4
- 33
- 60