4

I am trying download all the images given in the dataset.

https://www.kaggle.com/crowdflower/twitter-user-gender-classification Check for the downloads It is a CSV file containing 20000 datasets with 26 columns

I ran this script

    import requests
    import pandas as pd
    import os
    import imageio
    from pandas import DataFrame
    df=pd.read_csv('E:/gender-classifier-DFE-791531.csv',encoding='latin1')
    print(df.shape)
    imgURL=df['profileimage']
    uniID=df['_unit_id']
    gender=df['gender']
    dict={'images':[0],'gender':''}
    global jk
    jk=DataFrame(dict)
    def get_images(image_url,ID,gender,i):
        print(i)
        response=requests.get(image_url,stream=True)
        if not response.ok:
            print(response)
            return
        k=imageio.imread(image_url)
        k=k.flatten()
        dict1={'ID':ID,'images':[k],'gender':gender}
        df=pd.DataFrame(dict1)
        global jk
        jk=pd.concat([jk,df],axis=0)
        jk.set_index('ID')


for i in  range(187,len(imgURL)+1):
    get_images(imgURL[i],uniID[i],gender[i],i)
jk.to_csv('C:\\Users\\prabhu\\Desktop\\jk.csv',sep=',')

But I got issues after running for 150 dataset which is part of 20k dataset.

*Traceback (most recent call last):
  File "E:/image_extraction.py", line 29, in <module>
    get_images(imgURL[i],uniID[i],gender[i],i)
  File "E:/image_extraction.py", line 20, in get_images
    k=imageio.imread(image_url)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 221, in imread
    reader = read(uri, format, "i", **kwargs)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 143, in get_reader
    return format.get_reader(request)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 174, in get_reader
    return self.Reader(self, request)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 224, in __init__
    self._open(**self.request.kwargs.copy())
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\pillowmulti.py", line 57, in _open
    return PillowFormat.Reader._open(self)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\pillow.py", line 132, in _open
    if hasattr(self._im, "n_frames"):
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 96, in n_frames
    self.seek(self.tell() + 1)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 128, in seek
    self._seek(f)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 158, in _seek
    self.fp.seek(self.__offset)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\request.py", line 513, in seek
    ori_seek(i, mode)
io.UnsupportedOperation: seek*

Need Help to resolve this.

Harald K
  • 26,314
  • 7
  • 65
  • 111
Praba
  • 390
  • 1
  • 4
  • 13
  • ignore the intents – Praba Oct 11 '18 at 01:41
  • Your code does not include `{kaggle datasets download -d crowdflower/twitter-user-gender-classification}`. Where are you using them? – jww Oct 11 '18 at 02:35
  • I downloaded the datafile and extracted using this. df=pd.read_csv('E:/gender-classifier-DFE-791531.csv',encoding='latin1') – Praba Oct 11 '18 at 03:24

0 Answers0