0

I need an InputStream that reads from a specific portion of a file, and nothing more.

From the perspective of the consumer of the InputStream it would seem that the content is only that specific portion. The Consumer<InputStream> would be unaware its data came from a much larger file.
Therefor the InputStream should behave as follows:

  • The beginning of the file is skipped silently.
  • Then the desired portion of the file is returned.
  • Subsequent calls to the is.read() would return -1, even if the file contained more data.
Path file= Paths.get("file.dat");
int start = 12000;
int size = 600;

try(InputStream input = getPartialInputStream(file, start, size)){
    // This should receive an inputstream that returns exactly 600 bytes.
    // Those bytes should correspond to the bytes in "file.dat" found from position 12000 upto 12600.
    thirdPartyMethod(input);
}

Is there a good way to do this without having to implement a custom InputStream myself?
What could such a getPartialInputStream method look like?

neXus
  • 2,005
  • 3
  • 29
  • 53
  • 1
    It's pretty easy to implement that as its own class (extending `FilterInputStream`), considering you can use `skip()` and keep track of the size yourself. Avoiding creating your own class would probably make for uglier code. If you're really against writing classes, you can see if someone has written one for you already. – Kayaman Jun 05 '20 at 12:24
  • I'd open the original stream, `skip()` (or use https://guava.dev/releases/14.0/api/docs/com/google/common/io/ByteStreams.html#skipFully(java.io.InputStream,%20long)) and then wrap in https://commons.apache.org/proper/commons-io//javadocs/api-2.5/org/apache/commons/io/input/BoundedInputStream.html or https://guava.dev/releases/14.0/api/docs/com/google/common/io/ByteStreams.html#limit(java.io.InputStream,%20long) or whatever variant you might already have at hand – GPI Jun 05 '20 at 13:15

4 Answers4

1

There is something called a MappedByteBuffer whose content is a memory-mapped region of a file.

Another question has an answer that shows how to map such a MappedByteBuffer to an InputStream. This lead me to this solution:

public InputStream getPartialInputStream(file, start, size) {
    try (FileChannel channel = FileChannel.open(inFile, READ)) {
        MappedByteBuffer content = channel.map(READ_ONLY, start, size);
        return new ByteBufferBackedInputStream(content);
    }
}
public class ByteBufferBackedInputStream extends InputStream {

    ByteBuffer buf;

    public ByteBufferBackedInputStream(ByteBuffer buf) {
        this.buf = buf;
    }

    public int read() throws IOException {
        if (!buf.hasRemaining()) {
            return -1;
        }
        return buf.get() & 0xFF;
    }

    public int read(byte[] bytes, int off, int len)
            throws IOException {
        if (!buf.hasRemaining()) {
            return -1;
        }

        len = Math.min(len, buf.remaining());
        buf.get(bytes, off, len);
        return len;
    }
}

Warning about locked system resources (on Windows)

MappedByteBuffer suffers from a bug where the underlying file gets locked by the mapped buffer until the buffer itself is garbage-collected, and there is no clean way around it.

So you can only use this solution if you don't have to delete/move/rename the file afterwards. Trying to would lead to a java.nio.file.AccessDeniedException (unless you're lucky enough that the buffer was already garbage collected).

I'm not sure I should be hopeful about this getting fixed anytime soon.

neXus
  • 2,005
  • 3
  • 29
  • 53
0

Depending on where the original strem comes from, you might want to discard it and return your own stream instead. If the original stream supports reset(), the user at receiving end might make the beginning data visible to themselves.

public InputStream getPartialInputStream(InputStream is, int start, int size) throws IOException {
    // Put your fast-forward logic here, might want to use is.skip() instead
    for (int i = 0; i < start; i++) {
        is.read();
    }
    // Rewrite the part of stream you want the caller to receive so that
    // they receive *only* this part
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    for (int i = 0; i < size; i++) {
        int read = is.read();
        if (read != -1) {
            baos.write(read);
        } else {
            break;
        }
    }
    is.close();
    return new ByteArrayInputStream(baos.toByteArray());
}

Edit as an answer to comment:

If it's not desirable to rewrite the stream e.g. due to memory constraints, you can read the start bytes as in the first loop and then return the stream with something like Guava's ByteStreams.limit(is, size). Or subclass the stream and override read() with a counter to keep returning -1 as soon as size is read.

You could also write a temp file and return it's stream - this would prevent end-user from finding the file name with reflection from the original file's FileInputStream.

pafau k.
  • 1,667
  • 12
  • 20
  • This will hold the entire chunk in memory (in a `byte[]`) potentially long before the resulting InputStream is actually used. Is there a way to do this more lazily? And are there some optimizations possible when you know that the content comes from a file? – neXus Jun 05 '20 at 13:55
0

I wrote a utility class that you can use like this:

try(FileChannel channel = FileChannel.open(file, READ);
    InputStream input = new PartialChannelInputStream(channel, start, start + size)) {

    thirdPartyMethod(input);
}

It reads the content of the file using a ByteBuffer, so you control the memory footprint.

import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class PartialChannelInputStream extends InputStream {

    private static final int DEFAULT_BUFFER_CAPACITY = 2048;

    private final FileChannel channel;
    private final ByteBuffer buffer;
    private long position;
    private final long end;

    public PartialChannelInputStream(FileChannel channel, long start, long end)
            throws IOException {
        this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
    }

    public PartialChannelInputStream(FileChannel channel, long start, long end, int bufferCapacity)
            throws IOException {
        if (start > end) {
            throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
        }

        this.channel = channel;
        this.position = start;
        this.end = end;
        this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
        fillBuffer(end - start);
    }

    private void fillBuffer(long stillToRead) throws IOException {
        if (stillToRead < buffer.limit()) {
            buffer.limit((int) stillToRead);
        }
        channel.read(buffer, position);
        buffer.flip();
    }

    @Override
    public int read() throws IOException {
        long stillToRead = end - position;
        if (stillToRead <= 0) {
            return -1;
        }

        if (!buffer.hasRemaining()) {
            buffer.flip();
            fillBuffer(stillToRead);
        }

        try {
            position++;
            return buffer.get();
        } catch (BufferUnderflowException e) {
            // Encountered EOF
            position = end;
            return -1;
        }
    }
}

This implementation above allows to create multiple PartialChannelInputStream reading from the same FileChannel and use them concurrently.
If that's not necessary, the simplified code below takes a Path directly.

import static java.nio.file.StandardOpenOption.READ;

import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;

public class PartialFileInputStream extends InputStream {

    private static final int DEFAULT_BUFFER_CAPACITY = 2048;

    private final FileChannel channel;
    private final ByteBuffer buffer;
    private long stillToRead;

    public PartialChannelInputStream(Path file, long start, long end)
            throws IOException {
        this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
    }

    public PartialChannelInputStream(Path file, long start, long end, int bufferCapacity)
            throws IOException {
        if (start > end) {
            throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
        }

        this.channel = FileChannel.open(file, READ).position(start);
        this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
        this.stillToRead = end - start;
        fillBuffer();
    }

    private void fillBuffer() throws IOException {
        if (stillToRead < buffer.limit()) {
            buffer.limit((int) stillToRead);
        }
        channel.read(buffer);
        buffer.flip();
    }

    @Override
    public int read() throws IOException {
        if (stillToRead <= 0) {
            return -1;
        }

        if (!buffer.hasRemaining()) {
            buffer.flip();
            fillBuffer();
        }

        try {
            stillToRead--;
            return buffer.get();
        } catch (BufferUnderflowException e) {
            // Encountered EOF
            stillToRead = 0;
            return -1;
        }
    }

    @Override
    public void close() throws IOException {
        channel.close();
    }
}
neXus
  • 2,005
  • 3
  • 29
  • 53
  • 2
    `FileChannel` has a [read method taking an absolute position](https://docs.oracle.com/javase/8/docs/api/java/nio/channels/FileChannel.html#read-java.nio.ByteBuffer-long-). Since this method doesn’t access the channel’s position, it doesn’t require synchronization. – Holger Jun 09 '20 at 15:01
  • @Holger Thanks for the improvement! I edited the answer – neXus Jun 09 '20 at 15:35
0

One small fix for @neXus 's PartialFileInputStream class, in the read() method you need to make sure a byte value 0xff doesn't get returned as -1.

return buffer.get() & 0xff;

does the trick.

bwallis42
  • 1
  • 1