0

I want to read the file contents in a directory in stream mode, that is, when new files are added into the directory, then read it.

Following is the sample code, I observed that after the program prints all the already existing files in the directory, and then I add a new file, but the program doesn't print the contents of the newly added file.

I am not sure where the problem is.

import org.apache.flink.api.java.io.TextInputFormat
import org.apache.flink.core.fs.Path
import org.apache.flink.streaming.api.functions.source.FileProcessingMode
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}

import org.apache.flink.streaming.api.scala._

object FileBasedDataStreamTest {

  def main(args: Array[String]): Unit = {

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val path = "D:/flink-data/001"
    val ds: DataStream[String] = env.readFile(new TextInputFormat(new Path(path)), path, FileProcessingMode.PROCESS_CONTINUOUSLY, 100)
    ds.print()
    env.execute()
  }

}
Tom
  • 5,848
  • 12
  • 44
  • 104
  • Looks that the algorithm to pick up the `new` file matters. I create a new file and throw into the directory, then the program reads it. Looks it is using the last modified time to determine whether it is a new file or an old file(ignore create time or last visit time) – Tom Jan 14 '19 at 05:36
  • You can take a look at this if you're interested. – Jiayi Liao Jan 14 '19 at 06:47
  • @bupt_ljy looks you forgot to mention the link? – Tom Jan 14 '19 at 07:15
  • You're correct that Flink uses the last modification time to distinguish between new and already read files. – Till Rohrmann Jan 14 '19 at 08:16
  • @Tom Sorry... https://issues.apache.org/jira/browse/FLINK-10168 – Jiayi Liao Jan 14 '19 at 11:07
  • @Tom I have the same requirement . Were you able to find the solution? – MiniSu Jun 26 '21 at 12:54

0 Answers0