Downloading an object byte range from Google Cloud Storage using Java SDK

Question

I'm trying to download a byte range from Google Cloud Storage, using their Java SDK.

I can download an entire file like this.

Storage mStorage; // initialized and working

Blob blob = mStorage.get(pBucketName, pSource);

try (ReadChannel reader = mStorage.reader(blob.getBlobId())) {
    // read bytes from read channel
}

If I want, I can ReadChannel#seek(long) until I reach a desired starting byte, and download a range from that point, but that seems inefficient (although I don't know exactly what's happening in the implementation.)

Ideally I would like to specify the Range: bytes=start-end header as shown in the Google Cloud Storage REST API, but I can't figure out how to set the header in Java.

How can I specify the byte range in the Java SDK Storage get call, or specify the header, so I can efficiently download the desired byte range?

Using the NIO interface you can get a SeekableChannel into your file, then call the position method to get where you want to read from. That's part of their Java SDK. — TubesHerder, Sep 05 '19 at 18:03
Since no one has an acceptable answer here, I add an issue: https://github.com/googleapis/google-cloud-java/issues/7625 — petertc, Aug 04 '21 at 10:51
Update: I trace the SDK code and post what I found in the answers. — petertc, Aug 05 '21 at 07:31

score 1 · Answer 1 · answered Sep 05 '19 at 18:36

I understand you're trying to use Google Cloud's specific interface, but there is another way that perhaps you don't know about: Google Cloud can plug into Java's NIO interface. You can get a Path to a file on a bucket and use it as normal: get a SeekableChannel into your file, then call the position(long) method to get where you want to read from.

Here is sample code I tested:

import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

(...)

    public static void readFromMiddle(String path, long offset, ByteBuffer buf) throws IOException {
        // Convert from a string to a path, using available NIO providers
        // so paths like gs://bucket/file are recognized (provided you included the google-cloud-nio
        // dependency).
        Path p = Paths.get(URI.create(path));
        SeekableByteChannel chan = Files.newByteChannel(p, StandardOpenOption.READ);
        chan.position(offset);
        chan.read(buf);
    }

You'll recognize this is normal Java code, nothing special there except perhaps the unusual way we make the Path. That's the beauty of NIO. To make this code able to understand "gs://" URLs, you need to add the google-cloud-nio dependency. For Maven it's like this:

    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-nio</artifactId>
      <version>0.107.0-alpha</version>
    </dependency>

And that's all.

The documentation page shows how to do it for other dependency managers and gives some additional information.

score 1 · Answer 2 · answered Jan 08 '20 at 16:02

The solution is to just invoke ReadChannel#seek(offset).

For example:

        try (ReadChannel reader = blob.reader()) {
            // offset and readLength is obtained from HTTP Range Header
            reader.seek(offset);
            ByteBuffer bytes = ByteBuffer.allocate(1 * 1024 * 1024);
            int len = 0;
            while ((len = reader.read(bytes)) > 0 && readLength > 0) {
                outputStream.write(bytes.array(), 0, (int) Math.min(len, readLength));
                bytes.clear();
                readLength -= len;
            }
        }

petertc · Answer 3 · 2021-08-05T08:12:52.740

It turns out that you cannot fine-grained control of the range header in the current Java SDK implementation.

You can set the starting position by ReadChannel#seek(offset), but not the ending position.

Inside of the Java SDK, it will set the range header as Rrange:$offset-$(offset+bufferSize).

https://github.com/googleapis/java-storage/blob/master/google-cloud-storage/src/main/java/com/google/cloud/storage/spi/v1/HttpStorageRpc.java#L732

A workaround is wrapping the ReadChannel by yourself, and close the connection while it reaches the expected ending position. Code snipplet:


class GSBlobInputStream extends InputStream {

    private final ReadChannel channel;
    private long start = 0;
    private long end = -1;
    private InputStream delegate;

    public GSBlobInputStream(ReadChannel channel) {
        this.channel = channel;
    }

    public GSBlobInputStream(ReadChannel channel, long start, long end) {
        this.channel = channel;
        this.start = start;
        this.end = end;
    }

    @Override
    public int read() throws IOException {
        init();
        return delegate.read();
    }

    @Override
    public int read(byte[] b) throws IOException {
        init();
        return delegate.read(b);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        init();
        return delegate.read(b, off, len);
    }

    /**
     * Closes this input stream and releases any system resources associated with the stream.
     *
     * @throws IOException if an I/O error occurs.
     */
    @Override
    public void close() throws IOException {
        if (delegate != null) {
            delegate.close();
        }
    }

    private void init() throws IOException {
        if (delegate != null) {
            return;
        }

        channel.seek(start);

        delegate = Channels.newInputStream(channel);
        if (end != -1) {
            delegate = ByteStreams.limit(delegate, end - start + 1);
        }
    }


}

Note that in this approach, your will have 15MiB - 1 byte overhead in the worst case, since the default buffer size is 15MiB.

score -1 · Answer 4 · edited Jun 17 '19 at 21:36

This is a good example to read the contents of an object. In this link there are more code solutions:

Stream file from Google Cloud Storage

  /**
   * Example of reading a blob's content through a reader.
   */
  // [TARGET reader(String, String, BlobSourceOption...)]
  // [VARIABLE "my_unique_bucket"]
  // [VARIABLE "my_blob_name"]
  public void readerFromStrings(String bucketName, String blobName) throws IOException {
    // [START readerFromStrings]
    try (ReadChannel reader = storage.reader(bucketName, blobName)) {
      ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);
      while (reader.read(bytes) > 0) {
        bytes.flip();
        // do something with bytes
        bytes.clear();
      }
    }
    // [END readerFromStrings]
  }

Yeah, I know how to download an entire object (as shown in my question.) The thing which I don't know is _how to request a specific section of an object_ by adding a request header with a byte range to the request. — the_storyteller, Jun 17 '19 at 21:20

Downloading an object byte range from Google Cloud Storage using Java SDK

4 Answers4