Unable to grab div tag in Beautiful Soup in Python,

Question

I'm trying to download all the pokemon images available on the official website. The reason I'm doing this is because I want high quality images. Following is the code that I wrote.

from bs4 import BeautifulSoup as bs4
import requests
request = requests.get('https://www.pokemon.com/us/pokedex/')
soup = bs4(request.text, 'html')
print(soup.findAll('div',{'class':'container       pokedex'}))

Output is

[]

Is there something that I'm doing wrong? Also, is it legal to scrape from official website? Is there any tag or something that tells this?. Thanks

P.S: I'm new to BS and html.

Sushil · Accepted Answer · 2020-10-24T06:43:38.480

The images are loaded dynamically, so you have to use selenium to scrape them. Here is the full code to do it:

from selenium import webdriver
import time
import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(4)

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:
    img_link = li.find_element_by_xpath('.//img').get_attribute('src')
    name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

    r = requests.get(img_link)
    
    with open(f"D:\\{name}.png", "wb") as f:
        f.write(r.content)

    li_num += 1

driver.close()

Output:

12 pokemon images. Here are the first 2 images:

Image 1:

Image 2:

Plus, what I noticed was that there was a load more button at the bottom of the page. When clicked, it loads more images. We have to keep scrolling down after clicking the load more button to load more images. If I am not wrong, there is a total of 893 images on the website. In order to scrape all 893 images, you can use this code:

from selenium import webdriver
import time
import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(3)

load_more = driver.find_element_by_xpath('//*[@id="loadMore"]')

driver.execute_script("arguments[0].click();",load_more)

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
        lastCount = lenOfPage
        time.sleep(1.5)
        lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        if lastCount==lenOfPage:
            match=True

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:
    img_link = li.find_element_by_xpath('.//img').get_attribute('src')
    name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

    r = requests.get(img_link)
    
    with open(f"D:\\{name}.png", "wb") as f:
        f.write(r.content)

    li_num += 1

driver.close()

Hi Shshil. When I'm running the code on my machine it downloads all three images below "Load More Pokemon" button. Unable to reproduce your results. — Lawhatre, Oct 24 '20 at 06:37
I have updated my code such that it does not download those 3 images. Check it out. Plus, the edited code also saves the pokemon images with their name instead of a number. — Sushil, Oct 24 '20 at 06:45
Yeah I have changed my code so that it only downloads the first 893 images. Check out my edit. — Sushil, Oct 24 '20 at 06:49
@shushil. Your answer is what I exactly wanted. What is your opinion on the remaining 2 questions. — Lawhatre, Oct 24 '20 at 06:57
I don't think that scraping images from their website is illegal unless you use it for your personal needs. If you use it for commercial purposes, then it is definitely illegal as you are using their images without their permission. — Sushil, Oct 24 '20 at 07:17
that's neat. check out my answer to see how you could've done it without selenium @Sushil — help-ukraine-now, Oct 24 '20 at 08:52

score 2 · Answer 2 · answered Oct 24 '20 at 08:51

This could've been done much easier had you inspected the network tab first:

import time
import requests


endpoint = "https://www.pokemon.com/us/api/pokedex/kalos"
# contains all metadata
data = requests.get(endpoint).json()

# collect keys needed to save the picture
items = [{"name": item["name"], "link": item["ThumbnailImage"]} for item in data]

# remove duplicates
d = [dict(t) for t in {tuple(d.items()) for d in items}]
assert len(d) == 893


for pokemon in d:
    response = requests.get(pokemon["link"])
    time.sleep(1)
    with open(f"{pokemon['name']}.png", "wb") as f:
        f.write(response.content)

[clarification](https://stackoverflow.com/a/64406633/10140310) on 'the network tab' — help-ukraine-now, Oct 24 '20 at 09:16

Unable to grab div tag in Beautiful Soup in Python,

2 Answers2