9

Here is a code to download File from Google Cloud Storage:

@Override
public void write(OutputStream outputStream) throws IOException {
    try {
        LOG.info(path);
        InputStream stream = new ByteArrayInputStream(GoogleJsonKey.JSON_KEY.getBytes(StandardCharsets.UTF_8));
        StorageOptions options = StorageOptions.newBuilder()
                .setProjectId(PROJECT_ID)
                .setCredentials(GoogleCredentials.fromStream(stream)).build();
        Storage storage = options.getService();
        final CountingOutputStream countingOutputStream = new CountingOutputStream(outputStream);
        byte[] read = storage.readAllBytes(BlobId.of(BUCKET, path));
        countingOutputStream.write(read);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        outputStream.close();
    }
}

This works but the problem here is that it has to buffer all the bytes first before it streams back to the client of this method. This is causing a lot of delays especially when the file stored in the GCS is big.

Is there a way to get the File from GCS and stream it directly to the OutputStream, this OutputStream here btw is for a Servlet.

quarks
  • 33,478
  • 73
  • 290
  • 513

5 Answers5

19

Currently the cleanest option I could find looks like this:

Blob blob = bucket.get("some-file");
ReadChannel reader = blob.reader();
InputStream inputStream = Channels.newInputStream(reader);

The Channels is from java.nio. Furthermore you can then use commons io to easily read to InputStream into an OutputStream:

IOUtils.copy(inputStream, outputStream);
Tobias
  • 2,320
  • 2
  • 19
  • 18
15

Just to clarify, do you need an OutputStream or an InputStream ? One way to look at this is that the data stored in Google Cloud Storage object as a file and you having an InputStream to read that file. If that works, read on.

There is no existing method in Storage API which provides an InputStream or an OutputStream. But the there are 2 APIs in the Cloud Storage client library which expose a ReadChannel object which is extended from ReadableByteChannel (from java NIO API).

ReadChannel reader(String bucket, String blob, BlobSourceOption... options);
ReadChannel reader(BlobId blob, BlobSourceOption... options);

A simple example using this (taken from StorageSnippets.java):

/**
   * Example of reading a blob's content through a reader.
   */
  // [TARGET reader(String, String, BlobSourceOption...)]
  // [VARIABLE "my_unique_bucket"]
  // [VARIABLE "my_blob_name"]
  public void readerFromStrings(String bucketName, String blobName) throws IOException {
    // [START readerFromStrings]
    try (ReadChannel reader = storage.reader(bucketName, blobName)) {
      ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);
      while (reader.read(bytes) > 0) {
        bytes.flip();
        // do something with bytes
        bytes.clear();
      }
    }
    // [END readerFromStrings]
  }

You can also use the newInputStream() method to wrap an InputStream over the ReadableByteChannel.

public static InputStream newInputStream(ReadableByteChannel ch)

Even if you need an OutputStream, you should be able to copy data from the InputStream or better from the ReadChannel object into the OutputStream.

Complete example

Run this example as: PROGRAM_NAME <BUCKET_NAME> <BLOB_PATH>

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.WritableByteChannel;

import com.google.cloud.ReadChannel;
import com.google.cloud.storage.Bucket;
import com.google.cloud.storage.BucketInfo;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;

/**
 * An example which reads the contents of the specified object/blob from GCS
 * and prints the contents to STDOUT.
 *
 * Run it as PROGRAM_NAME <BUCKET_NAME> <BLOB_PATH>
 */
public class ReadObjectSample {
  private static final int BUFFER_SIZE = 64 * 1024;

  public static void main(String[] args) throws IOException {
    // Instantiates a Storage client
    Storage storage = StorageOptions.getDefaultInstance().getService();

    // The name for the GCS bucket
    String bucketName = args[0];
    // The path of the blob (i.e. GCS object) within the GCS bucket.
    String blobPath = args[1];

    printBlob(storage, bucketName, blobPath);
  }

  // Reads from the specified blob present in the GCS bucket and prints the contents to STDOUT.
  private static void printBlob(Storage storage, String bucketName, String blobPath) throws IOException {
    try (ReadChannel reader = storage.reader(bucketName, blobPath)) {
      WritableByteChannel outChannel = Channels.newChannel(System.out);
      ByteBuffer bytes = ByteBuffer.allocate(BUFFER_SIZE);
      while (reader.read(bytes) > 0) {
        bytes.flip();
        outChannel.write(bytes);
        bytes.clear();
      }
    }
  }
}
Tuxdude
  • 47,485
  • 15
  • 109
  • 110
  • Hello, in my case this code does not work for me, the while loop here does not even run for me, while the same Bucket/File works with my old code. – quarks Jul 11 '17 at 18:15
  • @xybrek - I've pasted a complete working example that works for me. Give it a shot. Make sure to pass the complete path of the blob/file within the bucket. Example: `PROGRAM_NAME my-bucket-1 path/to/some/content.txt` – Tuxdude Jul 11 '17 at 22:52
  • Doesn't work for me. I get: Exception in thread "main" java.lang.NoClassDefFoundError: com/google/auth/Credentials at com.xxx.test.ReadObjectSample .main(ReadObjectSample .java:28) Caused by: java.lang.ClassNotFoundException: com.google.auth.Credentials – Mike Dee Jan 20 '18 at 06:29
  • 1
    The ClassNotFoundException may be due to an old Guava version. Make sure you're using at least 18.0 – RHH Mar 29 '18 at 08:46
3

Code, based on @Tuxdude answer

 @Nullable
    public byte[] getFileBytes(String gcsUri) throws IOException {

        Blob blob = getBlob(gcsUri);
        ReadChannel reader;
        byte[] result = null;
        if (blob != null) {
            reader = blob.reader();
            InputStream inputStream = Channels.newInputStream(reader);
           result = IOUtils.toByteArray(inputStream);
        }
        return result;
    }

or

//this will work only with files 64 * 1024 bytes on smaller
 @Nullable
    public byte[] getFileBytes(String gcsUri) throws IOException {
        Blob blob = getBlob(gcsUri);

        ReadChannel reader;
        byte[] result = null;
        if (blob != null) {
            reader = blob.reader();
            ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);

            while (reader.read(bytes) > 0) {
                bytes.flip();
                result = bytes.array();
                bytes.clear();
            }
        }
        return result; 
    }

helper code:

   @Nullable
    Blob getBlob(String gcsUri) {
        //gcsUri is "gs://" + blob.getBucket() + "/" + blob.getName(),
        //example "gs://myapp.appspot.com/ocr_request_images/000c121b-357d-4ac0-a3f2-24e0f6d5cea185dffb40eee-850fab211438.jpg"

        String bucketName = parseGcsUriForBucketName(gcsUri);
        String fileName = parseGcsUriForFilename(gcsUri);

        if (bucketName != null && fileName != null) {
            return storage.get(BlobId.of(bucketName, fileName));
        } else {
            return null;
        }
    }

    @Nullable
    String parseGcsUriForFilename(String gcsUri) {
        String fileName = null;
        String prefix = "gs://";
        if (gcsUri.startsWith(prefix)) {
            int startIndexForBucket = gcsUri.indexOf(prefix) + prefix.length() + 1;
            int startIndex = gcsUri.indexOf("/", startIndexForBucket) + 1;
            fileName = gcsUri.substring(startIndex);
        }
        return fileName;
    }

    @Nullable
    String parseGcsUriForBucketName(String gcsUri) {
        String bucketName = null;
        String prefix = "gs://";
        if (gcsUri.startsWith(prefix)) {
            int startIndex = gcsUri.indexOf(prefix) + prefix.length();
            int endIndex = gcsUri.indexOf("/", startIndex);
            bucketName = gcsUri.substring(startIndex, endIndex);
        }
        return bucketName;
    }
Yuliia Ashomok
  • 8,336
  • 2
  • 60
  • 69
1

Another (convenient) way to stream a file from Google Cloud Storage, with google-cloud-nio:

Path path = Paths.get(URI.create("gs://bucket/file.csv"));
InputStream in = Files.newInputStream(path);
TubesHerder
  • 181
  • 6
1

Folks should be using Java 9 or above by now and so can use InputStream transferTo the output stream:


    // the resource url is something like gs://youbucket/some/file/path.csv
    public InputStream getUriAsInputStream( Storage storage, String resourceUri) {
        String[] parts = resourceUri.split("/");
        BlobId blobId = BlobId.of(parts[2], String.join("/", Arrays.copyOfRange(parts, 3, parts.length)));
        Blob blob = storage.get(blobId);
        if (blob == null || !blob.exists()) {
            throw new IllegalArgumentException("Blob [" + resourceUri + "] does not exist");
        }
        ReadChannel reader = blob.reader();
        InputStream inputStream = Channels.newInputStream(reader);
        return inputStream;
    }

// use it with something like: 
@Override
public void write(OutputStream outputStream) throws IOException {
    try {
        LOG.info(path);
        InputStream stream = new ByteArrayInputStream(GoogleJsonKey.JSON_KEY.getBytes(StandardCharsets.UTF_8));
        StorageOptions options = StorageOptions.newBuilder()
                .setProjectId(PROJECT_ID)
                .setCredentials(GoogleCredentials.fromStream(stream)).build();
        Storage storage = options.getService();
        final CountingOutputStream countingOutputStream = new CountingOutputStream(outputStream);
        
        final InputStream in = getUriAsInputStream(storage, "gs://your-bucket/path/to/file.csv");
        in.transferTo(outputStream)
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        outputStream.close();
        in.close();
    }
}
simbo1905
  • 6,321
  • 5
  • 58
  • 86