4

I am trying to write a program that downloads all the xkcd comics images and save them in a directory, with all the images name as title.png, title being the title of the comic. Here's the code for it:

#Downloads all the xkcd comics

import requests, bs4, os

site = requests.get('https://www.xkcd.com')

def downloadImage(site):
    soup = bs4.BeautifulSoup(site.text)
    img_tag = soup.select('div[id="comic"] img')
    img_title = img_tag[0].get('alt')
    img_file = open(img_title+'.png', 'wb')
    print("Downloading %s..." %img_title)
    img_res = requests.get("https:" +  img_tag[0].get('src'))
    for chunk in img_res.iter_content(100000):
        img_file.write(chunk)
    print("Saved %s in " %img_title, os.getcwd())


def downloadPrevious(site):
    soup = bs4.BeautifulSoup(site.text)
    prev_tag_list = soup.select("ul[class='comicNav'] li > a")
    prev_tag = None
    for each in prev_tag_list:
        if(each.get('rel')==['prev']):
            prev_tag = each
            break
    if(prev_tag.get('href') == '#'):
        return True
    prev_site = requests.get('https://xkcd.com' + prev_tag.get('href'))
    downloadImage(prev_site)
    return False, prev_site

def download_XKCD_Comics(site):
    try:
        os.makedirs('E:\\XKCD Comics')
    except:
        os.chdir('E:\XKCD Comics')

    done = False
    downloadImage(site)
    while(not done):
        done, site = downloadPrevious(site)
    return

download_XKCD_Comics(site)

The output of the code:

==== RESTART: E:\Computer_Science_Programs\Python\Get all XKCD Comics.py ====
Downloading Data Pipeline...
Saved Data Pipeline in  E:\XKCD Comics
Downloading Incoming Calls...
Saved Incoming Calls in  E:\XKCD Comics
Downloading Stanislav Petrov Day...
Saved Stanislav Petrov Day in  E:\XKCD Comics
Downloading Bad Opinions...
Saved Bad Opinions in  E:\XKCD Comics
Traceback (most recent call last):
  File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 45, in <module>
    download_XKCD_Comics(site)
  File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 42, in download_XKCD_Comics
    done, site = downloadPrevious(site)
  File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 30, in downloadPrevious
    downloadImage(prev_site)
  File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 11, in downloadImage
    img_file = open(img_title+'.png', 'wb')
FileNotFoundError: [Errno 2] No such file or directory: '6/6 Time.png'
>>> 

I don't understand the problem. None of the other files existed, but the error was raised only with this file name. Please somebody help me with this one!

Udasi Tharani
  • 141
  • 2
  • 4
  • 13
  • Are there other files with similar naming conventions (6/6 Time.png)? That in itself seems like it could stir trouble at some point. – d_kennetz Oct 05 '18 at 02:40
  • `6/6 Time.png` refers to a file named `6 Time.png` in the directory `6`. This directory doesn't exist, so you get that error. You need to filter out invalid characters before trying to use random strings as filenames. – kindall Oct 05 '18 at 02:44
  • 1
    It appears you dont have folder which named 6. – KC. Oct 05 '18 at 03:06
  • Thanks a lot! Taking care of just the naming convention got my problem solved:) – Udasi Tharani Oct 05 '18 at 03:53

2 Answers2

4

I just encountered an issue where I was getting FileNotFoundError: [Errno 2] No such file or directory: when opening a file in wb mode, which confused me because I thought using open with wb should create the file if it doesn't exist. Turns out the issue was that the file I was trying to create was in a directory that doesn't exist. Easy fix:

MNIST_DATA_DIRNAME = os.path.dirname(MNIST_DATA_FILENAME)
if not os.path.isdir(MNIST_DATA_DIRNAME):
    os.makedirs(MNIST_DATA_DIRNAME)

with open(MNIST_DATA_FILENAME, "wb") as f:
    f.write(b)
Jake Levi
  • 1,329
  • 11
  • 16
1

/ is an invalid character for Windows filenames.

Theres lots of ways to get a valid file name. One example is the one Django uses:

def get_valid_filename(s):
    s = str(s).strip().replace(' ', '_')
    return re.sub(r'(?u)[^-\w.]', '', s)

It replaces spaces with underscores, then removes any non-letter, number, _, -, or . characters.

Loocid
  • 6,112
  • 1
  • 24
  • 42