I have one huge csv file. I have a Jet cluster with 3 nodes. When the job is submitted only one node processes the entire file. What I want is the each part of work can be distributed. Meaning, how can I optimally use the resources in each of the nodes to get the work done faster.
Pipeline p = Pipeline.create();
BatchSource<List<String>> source = Sources.filesBuilder("files")
.glob("*.csv")
.build(path -> Files.lines(path).skip(1).map(line -> split(line)));
p.readFrom(source)
.map(function1)
.map(function2)
.writeTo(Sinks.filesBuilder("out").build());
instance.newJob(p).join();