0

I have been trying to open a image that I stored in the GCP bucket in my datalab notebook. When I use Image.open() it says like "No such file or directory: 'images/00001.jpeg'"

My code is:

nama_bucket = storage.Bucket("sample_bucket")
for obj in nama_bucket.objects():
    Image.open(obj.key)

I just need to open the images stored in the bucket and view it. Thanks for the help!

Brussel
  • 27
  • 1
  • 6

1 Answers1

1

I was able to reproduce the issue and get the same error as you (No such file or directory).

I will describe the workaround I used to solve it. However,there are few issues that I can see in the code snippet provided:

  • Class IPython.display.Image has no method 'open'.

  • You will need to wrap the Image constructor in a display() method.

With Storage APIs for Google Cloud Datalab, what resolved the issue for me was using the url parameter instead of the filename.

Here is the solution that worked for me:

import google.datalab.storage as storage
from IPython.display import Image

bucket_name = '<my-bucket-name>'
sample_bucket = storage.Bucket(bucket_name)

for obj in sample_bucket.objects():
    display(Image(url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)))

Let me know if it helps!


EDIT 1:

As you mentioned that you're using the PIL and would like your images to be handled by it, here's the way to achieve that (I have tested it and it worked well for me):

import google.datalab.storage as storage
from PIL import Image
import requests
from io import BytesIO

bucket_name = '<my-bucket-name>'
sample_bucket = storage.Bucket(bucket_name)

for obj in sample_bucket.objects():
    url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    print("Filename: {}\nFormat: {}\nSize: {}\nMode: {}".format(obj.key, img.format, img.size, img.mode))
    display(img) 

Notice that this way you will not need to use IPython.display.Image at all.


EDIT 2:

Indeed, the error cannot identify image file <_io.BytesIO object at 0x7f8f33bdbdb0> is appearing because you have a directory in your bucket. In order to solve this issue it's important to understand how Google Cloud Storage sub-directories work.

Here's how I organized the files in my bucket to replicate your situation:

my-bucket/
    img/
        test-file-1.png
        test-file-2.png
        test-file-3.jpeg
    test-file-4.png

Even though gsutil achieves the hierarchical file tree illusion by applying a variety of rules, to try to make naming work the way users would expect, in fact, the test-files 1-3 just happen to have '/'s in their names while there's no actual 'img' directory.

You can still still list all images from your bucket. With the structure I mentioned above it can be achieved, for example, by checking the file's extension:

import google.datalab.storage as storage
from PIL import Image
import requests
from io import BytesIO

bucket_name = '<my-bucket-name>'
sample_bucket = storage.Bucket(bucket_name)

for obj in sample_bucket.objects():
    # Check that the object is an image
    if obj.key[-3:].lower() in ('jpg','png') or obj.key[-4:].lower() in ('jpeg'):
        url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)
        response = requests.get(url)
        img = Image.open(BytesIO(response.content))
        print("Filename: {}\nFormat: {}\nSize: {}\nMode: {}".format(obj.key, img.format, img.size, img.mode))
        display(img)

If you need to get only the images "stored in a particular sub-directory" of your bucket, you will also need to check the files by name:

import google.datalab.storage as storage
from PIL import Image
import requests
from io import BytesIO

bucket_name = '<my-bucket-name>'
folder = '<name-of-the-directory>'
sample_bucket = storage.Bucket(bucket_name)

for obj in sample_bucket.objects():
    # Check that the object is an image AND that it has the required sub-directory in its name
    if (obj.key[-3:].lower() in ('jpg','png') or obj.key[-4:].lower() in ('jpeg')) and folder in obj.key:
        url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)
        response = requests.get(url)
        img = Image.open(BytesIO(response.content))
        print("Filename: {}\nFormat: {}\nSize: {}\nMode: {}".format(obj.key, img.format, img.size, img.mode))
        display(img)
Deniss T.
  • 2,526
  • 9
  • 21
  • Thank you for the help Denis T. I was able to display the image. But I am using the PIL library which can be converted to numpy array for training purposes. I'm new to this area and dont know how to convert an IPython.core.display.Image object to numpy array. Is there any way? or if there's a way to convert this image into PIL Image then I can proceed from there. – Brussel Dec 05 '19 at 15:50
  • I see. Let me try to test and find it out - I'll let you know once I have some info for you. – Deniss T. Dec 06 '19 at 15:19
  • Hello @Brussel, I have updated my answer! Please let me know if it works for you. – Deniss T. Dec 06 '19 at 15:44
  • Hi Dennis T, Sorry for the late response. I have tried your solution and it gave me the following response: cannot identify image file <_io.BytesIO object at 0x7f8f33bdbdb0> Is this because I'm storing my images in a directory? If so how can I access the directory first? Sorry for lot of follow up questions. I tried my best to get them, hope you know a better way. – Brussel Dec 09 '19 at 04:42
  • Hello @Brussel, I have updated my response. Please see the "Edit 2" section. – Deniss T. Dec 09 '19 at 09:27
  • I was still getting the same error. Just a quick doubt, Should this line url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key) be url='https://storage.googleapis.com/{}/{}/{}'.format(bucket_name, folder, obj.key) – Brussel Dec 09 '19 at 11:31
  • No, it should be url='storage.googleapis.com{}/{}'.format(bucket_name, obj.key). The name of sub-directory will be included in the name of the object already (e.g. "img/test-file-1.png", from my example) - this is how sub-directories work in GCS. – Deniss T. Dec 09 '19 at 11:37
  • Could you modify your question in order to elaborate a bit more on how your bucket is structured? – Deniss T. Dec 09 '19 at 11:37
  • My bucket has a single directory names images and inside that there are around 3000 images. – Brussel Dec 09 '19 at 11:50
  • Adding the **if statement** that will check the file's extension (that I offered in the "Edit 2" section) should allow you to list the images stored in a directory inside of your bucket. If you're saying it doesn't work for you it would be helpful to see a **code snippet** displaying how exactly you're trying to reach the objects. Please remember to **remove any sensitive information** from the details you will provide (and **replace** the name of your bucket with some mock data). – Deniss T. Dec 09 '19 at 15:25
  • Thanks for the help Dennis. Your solution actually worked out. Sorry for the delayed response. – Brussel Dec 30 '19 at 01:49