0

I am trying to download the images which are available as url in csv file into specific folders (named as Male, Female for classification) I created in my C:/Desktop/Images. But when I run the below code, nothing is getting downloaded and getting saved into the specific folders based on the category as present in my csv file. The contents in my csv are as in below format. It has several thousands of rows. I am trying to iterate and save the specific gender image to that particular folder created above

Format:
        Male   profilename    https://pbs.twimg.com/profile_images/414342229096808449/fYvzqXN7_normal.png

**Code:**
import urllib.request
import urllib.error
import sys

filename = "images"

with open("{0}.csv".format(filename), 'r') as csvfile:
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        try:
            if ((splitted_line[2] != "\n") and (splitted_line[2] != '') and (splitted_line[0] == male)):
                fullfilename_1 = os.path.join('C:/Desktop/Images/Male', splitted_line[1])
                urllib.request.urlretrieve(splitted_line[2], fullfilename_1 + ".png")
                print ("Image saved for {0} in {1} ".format(splitted_line[1],'C:/Desktop/Images/Male'))
                i += 1
        except:
                print ("No result for {0}".format(splitted_line[1]))
        try:
            if ((splitted_line[2] != "\n") and (splitted_line[2] != '') and (splitted_line[0] == female)):
                fullfilename_1 = os.path.join('C:/Desktop/Images/Female', splitted_line[1])
                urllib.request.urlretrieve(splitted_line[2], fullfilename_1 + ".png")
                print ("Image saved for {0} in {1} ".format(splitted_line[1],'C:/Desktop/Images/Female'))
                i += 1
        except:
                print ("No result for {0}".format(splitted_line[1]))

How would I be able to download/save to the specific folder as mentioned? Is there any issue with my path not being properly mentioned? Any help would kindly be appreciated!

1 Answers1

0

First change "images.csv" to "images.txt", then run this code:

import urllib.request
data = open("images.txt")
lines = data.readlines()

for row in lines:
    res = row.split(",")
    if len(res) < 3 or len(res) > 3:
        print ("Improper row.")
        continue
    if res[1].endswith(".jpg") == False and res[1].endswith(".png") == False:
        print ("Image not found.")
        continue
    try:
        location = f"C:/Desktop/Images/{res[0]}/"
        urllib.request.urlretrieve(res[2], location + res[1].split("/")[-1])
        print (f"Saved {res[1].split('/')[-1]} to {location[:-1]}.")
    except:
        print ("An error occured.")
  • Thanks for the reply Clinton. When I try this, I get **"IndexError: list index out of range"**. I tried changing range(len(lines)), but seems that did not work. Do you suggest something here? – Ravi Vemuri May 19 '20 at 15:29
  • I think that might be because some of the lines have less that 3 items. I added a if statement to fix this. – Clinton Graham May 19 '20 at 16:18
  • I think it worked now, but the output is now displaying Improper row and not saving it. Do you think the condition can be changed to something if >2? Never mind, tried that, but looks like that's not the solution to it by changing the number – Ravi Vemuri May 19 '20 at 17:12
  • Can you show me what the rest of the text file looks like? – Clinton Graham May 19 '20 at 17:50
  • So this is how the text file looks like. Gender Name_of_id url. It is in this order. Sorry I was not able to paste it male sheezy0 https://pbs.twimg.com/profile_images/414342229096808449/fYvzqXN7_normal.png male DavdBurnett https://pbs.twimg.com/profile_images/539604221532700673/WW16tBbU_normal.jpeg – Ravi Vemuri May 19 '20 at 18:40
  • This might take a little while to sort out, respond to this so I can give you my contact info. – Clinton Graham May 19 '20 at 19:48
  • Sure. Please let me know – Ravi Vemuri May 19 '20 at 21:45
  • Please let me know if you were able to suggest something regarding this @ClintonGraham – Ravi Vemuri May 22 '20 at 16:27