2

I am currently writing my first Flink application and would like to monitor a folder for new files. Unfortunately I could not find many examples on this topic.

I found the readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo) function to monitor a directory.

public static void main(String[] args) throws Exception {

    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    DataStream<String> inputStream = env.readFile(new TextInputFormat(new Path(filePath)),filePath, FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
    inputStream.print();
    env.execute("Flink streaming test");

}

I thought this function watches the folder and reads a file when it is added to the folder. However, only the first file that is added is read. I guess I didn't really understand the way it worked. Can someone explain to me why it doesn't work that way and what the right way would be?

help-info.de
  • 6,695
  • 16
  • 39
  • 41
ayaui
  • 41
  • 4
  • It can do what you were expecting it to. If you provide more details perhaps we can help diagnose what's wrong with what you tried. Is filePath pointing to a directory, or a file? How did you add the files to the directory? It's best to atomically move them into the directory being watched once they are ready to be ingested. – David Anderson Oct 28 '19 at 20:59
  • Thanks for your respond! The path is pointing to the directory the files are moved to. For first tests I moved some files into the directory myself. But then I will try it again automatically. – ayaui Oct 29 '19 at 08:21
  • 1
    Flink is keeping track of the modified timestamp on the directory, and ingesting anything in the directory that is newer than this timestamp. Don't worry about how you move the files in -- what will confuse things is if you modify the files after moving them in -- that's why I recommended moving them in atomically (not automatically). If you append to them or otherwise modify them after they are in the directory, they will be re-ingested. – David Anderson Oct 29 '19 at 08:29
  • What filesystem are you using? I'm not sure how well this works on an eventually consistent filesystem, such as S3. – David Anderson Oct 29 '19 at 08:31
  • Sorry, I read that wrong. I am using my local file system on Ubuntu and do not modify the files after moving. At least I dont intend to. – ayaui Oct 29 '19 at 09:00
  • I have now written a script which moves the files into the folder. First I just moved the files. That didn't work. But if I copy the files with the script it works! So the problem was really the way I moved the files. Thanks a lot! – ayaui Oct 29 '19 at 11:19

0 Answers0