1

I have a CSV file with 6 columns and many rows. I would like to download all the png or jpg from the column 'link' in a folder with the same name of my CSV file.Then I would like to rename these images with each 'title' content.

url1.png by name1.png for each files and until the last row..

I started something with this -

import csv
with open('name.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in reader:
        fileurl = row[0]
        filename = row[1]
        urllib.request.urlretrieve(fileurl, "name" + filename)

Rows example -

enter image description here

Still learning.. Any help or suggestions to do this?

Many thanks.

bigbox
  • 17
  • 5
  • bigbox, I have removed your unnecessary [[tag:batch-file]] tag, as your question code is clearly using the [[tag:python]] scripting language. – Compo Mar 04 '22 at 16:57
  • 2
    Can you provide some example rows in the csv? – Aditya Mar 05 '22 at 06:35
  • Hi Aditya, I just edit my first post with an example of a row and try to simplify my ask. Hope you can maybe help. Thanks – bigbox Mar 05 '22 at 11:27
  • Please show a couple of rows from the actual input CSV format (e.g. with the commas and quotechars) so we can copy/paste and test your script. Also show what your expected output would be for the example. Thanks. – Martin Evans Mar 05 '22 at 17:42
  • Hi Martin, I edited my post. Is it more clear to you? – bigbox Mar 05 '22 at 18:09

1 Answers1

1

If I understand you correctly, you would like to download the file in the link column using the title column to form the filename.

This can be done as follows:

import urllib.request
import csv
import os

with open('name.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    
    for row in reader:
        name, ext = os.path.splitext(row['link'])
        title_filename = f"{row['title']}{ext}".replace('/', '-')
        urllib.request.urlretrieve(row['link'], title_filename)

You can use .os.path.splitext() to split out the extension of the filename. This can then be used to combine with the entry from title to form a new filename.

For example:

https://url.com/folder/url1.png would save as name1.png


To deal with multiple identical title entries, you could investigate Python's Counter() to keep track of how many of each title you have. For example:

from collections import Counter
import urllib.request
import csv
import os

with open('name.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    title_counts = Counter()
    
    for row in reader:
        name, ext = os.path.splitext(row['link'])
        title = row['title']
        title_counts[title] += 1
        title_filename = f"{title}_{title_counts[title]}{ext}"
        urllib.request.urlretrieve(row['link'], title_filename)
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • Thanks Martin! I just post a question about your solution below. Maybe you see what happen here? – bigbox Mar 07 '22 at 13:06
  • What do you see if you add `print(f"'{title_filename}'")` before the `urlrreive()` call? – Martin Evans Mar 07 '22 at 14:27
  • I have the same error unfortunately. – bigbox Mar 07 '22 at 14:42
  • 1
    You should see the name of the file to be created before the error occurs. Does it contain extra spaces or a forward slash character? Does any value in your CSV `title` column contain `/` characters? If so it will try and create the file in a subfolder which probably does not exist. – Martin Evans Mar 07 '22 at 14:46
  • I just remove the '/' from Image from '/' Sam.jpeg in my csv file and it's working... any way to include this / without having an error during download? Otherwise I will remove all these / by hand. – bigbox Mar 07 '22 at 14:48
  • 1
    If `title` contains `/` characters, you could remove them with `.replace('/', '-')` – Martin Evans Mar 07 '22 at 14:51
  • Perfect now! Seems easy when I see your final code but quite hard when you start learning Python. You help me a lot, many thanks Martin! – bigbox Mar 07 '22 at 15:03
  • Oh! Something I didn't see earlier. Some titles on my column 'title' are the same for several images and when it's downloading, it automatically remove and replace the older file downloaded :/ Is it a way to add 1,2,3 at the end of the file if it already exist? – bigbox Mar 07 '22 at 15:27
  • I tried something with `df['link'] = df['link'].str.cat(map(str, df.index), sep='_') map(str, range(1, df.shape[0] + 1))` but it doesn't work. – bigbox Mar 07 '22 at 16:00
  • I have added a possible workaround for you – Martin Evans Mar 07 '22 at 19:59
  • Yes! Perfect! Thanks again for your precious help. – bigbox Mar 07 '22 at 22:06