If you are using an NFS then set the sharedFileSystem
property to true
:
BatchSource<SalesRecordLine> source = Sources.filesBuilder(sourceDir)
.glob("*.csv")
.sharedFileSystem(true)
.build(path -> Files.lines(path).skip(1).map(SalesRecordLine::parse));
From the method javadoc:
Sets if files are in a shared storage visible to all members. Default
value is false. If sharedFileSystem is true, Jet will assume all
members see the same files. They will split the work so that each
member will read a part of the files. If sharedFileSystem is false,
each member will read all files in the directory, assuming the are
local.
For the batch source, Jet assumes the files are not modified while they are read. If they are, the result is undefined.
If you want to monitor files as they are written to, use FileSourceBuilder.buildWatcher()
instead of build()
- this will create a streaming job. But the watcher processes only lines appended since the job started. Again, if the files are modified in any other way than appending at the end, the result is undefined. For example, many text editors delete and write the entire file, even when you just appended a line at the end - for testing it's easiest to use
echo "text" >> your_file"