-3

I am using an http library to fetch data that is 200 mb in size. Each line in the data is then processed. To save memory I would like to process the data line by line as the data is streamed in rather than waiting for all 200 mb to be downloaded first.

The http library I am using exposes a method that looks something like OnCharReceived(CharBuffer buffer) that can be overridden so that I can in effect process each chunk of data as it comes in.

I would like to expose this data as an InputStream. My first thought was to use a PipedInputStream and PipedOutputStream pair where in OnCharReceived() I would write to the PipedOutputStream and in my thread read from the PipedInputStream. However, this seems to have the problem that the underlying buffer of the pipe could get full requiring the writing thread to block in OnCharReceived until my thread gets around to processing data. But blocking in OnCharReceived would probably be blocking in the http library's IO thread and would be very bad.

Are there Java classes out there that handle the abstract problem I need to solve here without me having to roll my own custom implementation. I know of things like BlockingQueue that could be used as part of a larger solution. But are there any simple solutions.

For reasons of legacy code I really need the data exposed as an InputStream.

Edit: To be more precise I am basing my code on the following example from the apache http async library

https://hc.apache.org/httpcomponents-asyncclient-dev/httpasyncclient/examples/org/apache/http/examples/nio/client/AsyncClientHttpExchangeStreaming.java

user782220
  • 10,677
  • 21
  • 72
  • 135
  • 2
    You're making a mountain out of a molehill. Just use `URL.openStream()` or whatever its equivalent is in your library, if indeed you really need to use that library at all. Using a callback and trying to turn that *back* into a stream for use by another thread is just wasting time and space. – user207421 Jul 31 '16 at 03:46
  • @EJP a small library can be justified if it addresses decryption concerns, for example, in which case he would have to integrate with whatever API this library provides – Dici Jul 31 '16 at 04:16
  • Re your 'edit, to be precise', you should be doing no such thing. That library already exposes an `InputStream`. Use it. – user207421 Jul 31 '16 at 06:00
  • @Dici Your point escapes me. I didn't say his use of a library was unjustified. I don't have any information about that, and neither do you. It turns out he is using the Apache HTTP client, which certainly exposes an `InputStream`interface, which he should therefore certainly be using. – user207421 Jul 31 '16 at 06:21
  • 1
    @EJP Where do you see this InputStream in the apache http async client library? – user782220 Jul 31 '16 at 06:51
  • I see it in the Apache HTTP Client, of which the library you mention is surely a subset. You shouldn't be using asynchronous I/O if what you want is an `InputStream`. I've stated this several times. – user207421 Jul 31 '16 at 08:02
  • @EJP ? "You shouldn't be using asynchronous I/O if what you want is an InputStream" What do you mean by this ? I am facing a similar problem where I need to write the data to a stream for my client to read it. I am using Apache async client . We cannot use a sync client because that would block the thread till we get some response from the server. – lostintranslation May 16 '18 at 13:55

1 Answers1

1

If there's a simpler solution I would not get near Piped[In/Out]putStream. It introduces unnecessary complicated threading concerns as you pointed out. Keep in mind you can always write to a temp file and then read from the file as an InputStream. This also has the advantage of closing the HTTP connection as fast as possible and avoid timeouts.

There might be other solutions depending on the API you are using but I think the proposed solution still makes sense for the reasons above.

Dici
  • 25,226
  • 7
  • 41
  • 82