0

I'm downloading images from open Images Dataset V4, it turns out that in a moment, it comes with an error.

The code is the following:

from skimage import io

saved_dirs = ['/content/drive/My Drive/AI/Dataset/Open Images Dataset v4 (Bounding Boxes)/Person','/content/drive/My Drive/AI/Dataset/Open Images Dataset v4 (Bounding Boxes)/Mobile Phone','/content/drive/My Drive/AI/Dataset/Open Images Dataset v4 (Bounding Boxes)/Car']
classes = ['Person', 'Mobile phone', 'Car']

# Download images
for i in range(len(classes)):
    # Create the directory
    os.mkdir(saved_dirs[i])
    saved_dir = saved_dirs[i]
    for url in urls[i]:
        img = io.imread(url)
        saved_path = os.path.join(saved_dir, url[-20:])
        if img.shape[0] == 2:
               img = img[0]
        io.imsave(saved_path, img)

And output:

KeyErrorTraceback (most recent call last)
<ipython-input-33-3a84148b069d> in <module>()
      9         if img.shape[0] == 2:
     10                img = img[0]
---> 11         io.imsave(saved_path, img)

2 frames
/usr/local/lib/python2.7/dist-packages/skimage/util/dtype.pyc in dtype_limits(image, clip_negative)
     55         warn('The default of `clip_negative` in `skimage.util.dtype_limits` '
     56              'will change to `False` in version 0.15.')
---> 57     imin, imax = dtype_range[image.dtype.type]
     58     if clip_negative:
     59         imin = 0

KeyError: <type 'numpy.object_'>

Before I added:

    if img.shape[0] == 2:
           img = img[0]

And now download more images into different folders, but in the same way at some point it falls, in the end.

I found this question, the same problem KeyError: class 'numpy.object_' while downloading image dataset using imread

Sebastián
  • 437
  • 5
  • 19
  • Could you show `urls`? – Tonechas Dec 06 '19 at 11:21
  • @Tonechas example of urls= 'https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/train/92dcd5c3ab9cb5ad.jpg', 'https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/train/2575003ac0e0de87.jpg', ... – Sebastián Dec 07 '19 at 01:08
  • @Tonechas It comes from an excel, I edited the question with more information – Sebastián Dec 07 '19 at 01:09
  • I've read the two samples images without problems. Could you share the excel with the URL of all the images? – Tonechas Dec 07 '19 at 02:18
  • @Tonechas Yes, I have all files on drive here https://drive.google.com/drive/folders/1d_rgF1fN3zkMnYMi6tqMUx3deJsUqhTB?usp=sharing All the dataset is very large so I filter it by classes and that is called "subperson_img_url.csv" "subphone_img_url.csv" and "subcar_img_url.csv" and from this subfiles, will download all images with their carpet "Person" "Car" "Mobile phone" – Sebastián Dec 07 '19 at 02:51
  • It will download some images fine, but it will be some moment that crash with KeyError: error – Sebastián Dec 07 '19 at 03:00
  • Each folder must have 1000 images, I realized that the "Car" folder has a much smaller amount – Sebastián Dec 07 '19 at 03:10

1 Answers1

2

I have downloaded subcar_img_url.csv, subperson_img_url.csv and subphone_img_url.csv from the link you provided and saved them in my current working directory. Then I run this code:

import pandas as pd
from skimage import io
import os

folder = 'path/of/your/current/working/directory'
for klass in ['car', 'person', 'phone']:
    fn = f'sub{klass}_img_url.csv'
    print(fn)
    df = pd.read_csv(os.path.join(folder, fn))
    for i, url in enumerate(df.image_url):
        print(i, end='-')
        img = io.imread(url)
        io.imsave(os.path.join(folder, url[-20:]), img)
    print('\n')

I was able to download all the images (3000) without problem. This is the output I got (no exception was thrown):

subcar_img_url.csv
0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-
32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-
...
975-976-977-978-979-980-981-982-983-984-985-986-987-988-989-990-991-992-993-994-995-
996-997-998-999-

subperson_img_url.csv
0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-
32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-
...
975-976-977-978-979-980-981-982-983-984-985-986-987-988-989-990-991-992-993-994-995-
996-997-998-999-

subphone_img_url.csv
0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-
32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-
...
975-976-977-978-979-980-981-982-983-984-985-986-987-988-989-990-991-992-993-994-995-
996-997-998-999-

Used scikit-image version 0.15.0.

Tonechas
  • 13,398
  • 16
  • 46
  • 80