0

I have a requirement to read data from HDFS and publish it to a Kafka topic. Because they are part of DataSet and DataStream APIs, is it possible to do what I'm looking for in a single job?

Harshith Bolar
  • 728
  • 1
  • 10
  • 29

1 Answers1

1

Flink's DataStream API can be used to read from HDFS files. See readfile() in https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#data-sources. Or you can use the file system connector with the Table and SQL APIs, but it only supports CSV.

David Anderson
  • 39,434
  • 4
  • 33
  • 60