Consume multiple text files with Apache Flink DataSet API

Question

I am writing a batch job with Apache Flink using the DataSet API. I can read a text file using readTextFile() but this function just read one file at once.

I would like to be able to consume all the text files in my directory one by one and process them at the same time one by one, in the same function as a batch job with the DataSet API, if it is possible.

Other option is implement a loop doing multiple jobs, one for each file, instead of one job, with multiples files. But I think this solution is not the best.

Any suggestion?

have you resolved it? – whatsinthename May 18 '21 at 13:14 — whatsinthename, May 18 '21 at 13:14

score 1 · Answer 1 · answered Oct 30 '19 at 19:47

1

If I got the documentation right you can read an entire path using ExecutionEnvironment.readTextFile(). You can find an example here: Word-Count-Batch-Example

References:

answered Oct 30 '19 at 19:47

TobiSH

2,833
3
23
33

Consume multiple text files with Apache Flink DataSet API

1 Answers1