I did this small code to put files from a folder in a data stream:
public class TextFromDirStream {
//
// Program
//
public static void main(String[] args) throws Exception {
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
// monitor directory, checking for new files
// every 100 milliseconds
TextInputFormat format = new TextInputFormat(
new org.apache.flink.core.fs.Path("file:///tmp/dir/"));
DataStream<String> inputStream = env.readFile(
format,
"file:///tmp/dir/",
FileProcessingMode.PROCESS_CONTINUOUSLY,
100,
FilePathFilter.createDefaultFilter());
inputStream.print();
// execute program
env.execute("Java read file from folder Example");
}
}
My next step is the deal with the file content (a csv). What is the most effective way to deal with this ? Do I change my code to parse the text file inputStream and transform it as a Tuple or readFile as a CSV from the beginning. I ask the question because I have difficulty to find example or documentation on how to split text to tuple.
Thank you in advance