Using Spring Integration Java DSL, I have constructed a flow where I'm processing files synchronously with a FileSplitter
. I've been able to use the setDeleteFiles
flag on a AbstractFilePayloadTransformer
to delete the file after converting each line in File
to a Message
for subsequent processing, like so:
@Bean
protected IntegrationFlow s3ChannelFlow() {
// do not exhaust filesystem w/ files downloaded from S3
FileToInputStreamTransformer transformer = new FileToInputStreamTransformer();
transformer.setDeleteFiles(true);
// @see http://docs.spring.io/spring-integration/reference/html/files.html#file-reading
// @formatter:off
return IntegrationFlows
.from(s3Channel())
.channel(StatsUtil.createRunStatsChannel(runStatsRepository))
.transform(transformer)
.split(new FileSplitter())
.transform(new JsonToObjectViaTypeHeaderTransformer(new Jackson2JsonObjectMapper(objectMapper), typeSupport))
.publishSubscribeChannel(p -> p.subscribe(persistenceSubFlow()))
.get();
// @formatter:on
}
This works fine, but is slow. So I attempt to add an ExecutorChannel
after the .split
above, like so:
.channel(c -> c.executor(Executors.newFixedThreadPool(10)))
But then the aforementioned delete flag does not allow the flow to complete successfully deleting file(s) before they are completely read.
If I remove the flag I have the potential to exhaust the local file system where files were synchronized from S3.
What could I introduce above to a) process each file completely and b) delete file from local filesystem once done? In other words, is there a way to get to know exactly when a file is completely processed (when it's lines have been processed asynchronously via threads in a pool)?
If you're curious here's my impl of FileToInputStreamTransformer
:
public class FileToInputStreamTransformer extends AbstractFilePayloadTransformer<InputStream> {
private static final int BUFFER_SIZE = 64 * 1024; // 64 kB
@Override
// @see http://java-performance.info/java-io-bufferedinputstream-and-java-util-zip-gzipinputstream/
protected InputStream transformFile(File payload) throws Exception {
return new GZIPInputStream(new FileInputStream(payload), BUFFER_SIZE);
}
}
UPDATE
So how does something in downstream flow know what to ask for?
Incidentally, if I'm following your advice correctly, when I update the .split
with new FileSplitter(true, true)
above, I get
2015-10-20 14:26:45,288 [pool-6-thread-1] org.springframework.integration.handler.LoggingHandler ERROR org.springframework.integration.transformer.MessageTransformationException: failed to transform message; nested exception is java.lang.IllegalArgumentException: 'json' argument must be an instance of: [class java.lang.String, class [B, class java.io.File, class java.net.URL, class java.io.InputStream, class java.io.Reader] , but gotten: class org.springframework.integration.file.splitter.FileSplitter$FileMarker at org.springframework.integration.transformer.AbstractTransformer.transform(AbstractTransformer.java:44)