0

As mentioned in the document:

For example a data pipeline might monitor a file system directory for new files and write their data into an event log. Another application might materialize an event stream to a database or incrementally build and refine a search index.

So, how can I follow a local file system file updating while using Flink?

Here, the document also mentioned that:

File system sources for streaming is still under development. In the future, the community will add support for common streaming use cases, i.e., partition and directory monitoring.

Does this mean I could use the API to do some special streaming? If you know how to use streaming file system source, please tell me. Thanks!

Jason Pan
  • 702
  • 7
  • 21
  • The new FLIP-27 filesystem connector has `monitorContinuously` built into it. I don't know whether this is exposed in PyFlink yet, though. – Ingo Bürk Aug 17 '21 at 06:04
  • @IngoBürk Thanks! I searched the code and find there is a `monitor_continuously` in `FileSourceBuilder` under flink-py directory. – Jason Pan Aug 17 '21 at 06:15
  • @IngoBürk I have tried the `monitor_continuously` but I found It only could watch new file and read it once. If I append new content to those files, it would not be digested to the source again. Do you know how to enable the FileSourceBuilder to follow in-progress files? – Jason Pan Aug 17 '21 at 08:30
  • I don't think monitoring existing files is supported by the connector. I can think of a few problems this would cause, but I don't know if they are the reason for that. You can try asking on the Flink user mailing list as well. – Ingo Bürk Aug 17 '21 at 12:15
  • @IngoBürk Thanks! – Jason Pan Aug 17 '21 at 12:25

0 Answers0