The goal is to stream a large json.gz file (4 GB compressed, around 12 GB uncompressed, 12 million rows) from a web server to the database directly without downloading locally. Since Spring integration outbound gateway doesn't support gzip format, I'm doing it myself using okhttp that automatically decompresses the response:
body = response.body().byteStream(); // thanks okhttp
reader = new InputStreamReader(body, StandardCharsets.UTF_8);
br = new BufferedReader(reader, bufferSize);
Flux<String> flux = Flux.fromStream(br.lines())
.onBackpressureBuffer(10000, x -> log.error("Buffer overrun!"))
.doAfterTerminate(() -> closeQuietly(closeables))
.doOnError(t -> log.error(...))
In the integration flow:
.handle(new MessageTransformingHandler(new GzipToFluxTransformer(...)))
.split()
.log(LoggingHandler.Level.DEBUG, CLASS_NAME, Message::getHeaders)
.channel(repositoryInputChannel())
But
2017-12-08 22:48:47.846 [task-scheduler-7] [ERROR] c.n.d.y.s.GzipToFluxTransformer - Buffer overrun!
2017-12-08 22:48:48.337 [task-scheduler-7] [ERROR] o.s.i.h.LoggingHandler - org.springframework.messaging.MessageHandlingException:
error occurred in message handler [org.springframework.integration.splitter.DefaultMessageSplitter#1];
nested exception is reactor.core.Exceptions$OverflowException: The receiver is overrun by more signals than expected (bounded queue...),
failedMessage=...}]
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:153)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:116)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:132)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:105)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:73)
The output channel is hooked up at runtime using an unbounded queue polled by a bridge. This is to facilitate testing such that the queue can be replaced by a DirectChannel
for testing.
@Bean(name = "${...}")
public PollableChannel streamingOutputChannel() {
return new QueueChannel();
}
@Bean
public IntegrationFlow srcToSinkBridge() {
return IntegrationFlows.from(streamingOutputChannel())
.bridge(e -> e.poller(Pollers.fixedDelay(500)))
.channel(repositoryInputChannel())
.get();
}
I've couple of doubts here.
- I'm not sure that the dynamic binding using SPEL in the bean name is working, but I don't know how to verify it.
- Since the queue is unbounded, all I can think of is that the polling is not quick enough. However, the exception suggests that the splitter is having a problem keeping up.