I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

Question

I'm new to web scraping so I am not totally sure what to do here. But I am trying to extract the images from the site in this URL:

Here are the loops that got the closest to working:

For loop with parsing function

import requests
import os as os
from tqdm import tqdm
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin, urlparse

url = "https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5790&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714"
def is_valid(url):
    """
    Checks whether `url` is a valid URL.
    """
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)

def get_all_images(url):
    """
    Returns all image URLs on a single `url`
    """
    soup = bs(requests.get(url).content, "html.parser")
urls = []
for img in tqdm(soup.find_all("img"), "Extracting images"):
    img_url = img.attrs.get("src")
    if not img_url:
        # if img does not contain src attribute, just skip
        continue
os.getcwd()

While loop - image scraping

import requests
from bs4 import BeautifulSoup

# link to first page - without `page=`
url = 'https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5799&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714'

# only for information, not used in url
page = 0 

while True:

    print('---', page, '---')

    r = requests.get(url)

    soup = BeautifulSoup(r.content, "html.parser")

    # String substitution for HTML
    for link in soup.find_all("img"):
        print("<img href='>%s'>%s</img>" % (link.get("href"), link.text))

    # Fetch and print general data from title class
    general_data = soup.find_all('div', {'class' : 'title'})

    for item in general_data:
        print(item.contents[0].text)
        print(item.contents[1].text.replace('.',''))
        print(item.contents[2].text)

    # link to next page

    next_page = soup.find('a', {'class': 'next'})

    if next_page:
        url = next_page.get('href')
        page += 1
    else:
        break # exit `while True`

I tried to gear both of these towards downloading the image links that output but I haven't been able to get outputs for anything I've tried. Any help is greatly appreciated!

please consider posting your code as text and not as image. Like this no one can run your code before somehow extracting the code out of the image. — Tofu, Feb 04 '21 at 19:33
That webpage does not contain any `` tags, it contains a link to one JPG file which could be obtained. What are you trying to get? The single JPG image or the `.fits` images contained inside the `.gz` entries in the table? — Martin Evans, Feb 05 '21 at 10:21
Oh Thank you! Essentially I am looking for a big list of images for galaxies in that cluster. I may need to use a different link source if there is only one image listed in the data. Thank you so much for your help! — Autumn, Feb 05 '21 at 18:16

I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

0 Answers0