0

I have many gzipped files which contain records that I am trying to sequence into a single consolidated file. CPU power is not a constraint.

I want to spin up threads that read from GZipInputStreams as necessary. The amount that will be read from each file at any given time is variant and unpredictable. The most obvious way to solve this problem is to have a thread pool where a task is submitted to read from a GZipInputStream if a backing buffer falls below a low watermark.

I am concerned that reading from a single GZipInputStream from different threads could manifest a memory barrier issue since it may have been assumed that data would be consumed from only one thread.

To be clear, I am not suggesting that more than one thread will read from the same GZipInputStream concurrently, but rather the lack of synchronization monitors may cause some data to be inconsistent if the stream is read from one thread and then immediately read from another thread.

  • 3
    None of the [`InputStream`](https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html) implementations in Java are thread-safe. If you want multiple threads to read from a single `InputStream`, then you need to synchronize access. Given the overhead of gunzip'ing and reading from file, synchronization is minuscule, so there's no reason not to do it. – Andreas Jul 24 '16 at 04:24
  • 2
    In general you should assume that a class is *not* thread safe unless it specifically says that it is. GZipInputStream is no exception to this rule. – D.B. Jul 24 '16 at 04:28
  • It doesn't make the least bit of sense to have multiple threads reading from the same stream. – user207421 Jul 24 '16 at 05:08
  • I'm not sure I have properly explained the situation. No two threads will ever read from the same GZInputStream at the same time. The issue is simply that a task may be submitted to an executor to read some data from the stream and then it will return; later another task may be submitted to read more. The executor is likely to have assigned different threads to execute each of these tasks. Those tasks, in theory can be submitted nearly immediately one-after-the-other. The question is whether it is safe to begin reading from one thread and continue reading from another. – user3111822 Jul 25 '16 at 14:40

0 Answers0