How to scrape a page that is dynamicaly locaded?

Question

So here's my problem. I wrote a program that is perfectly able to get all of the information I want on the first page that I load. But when I click on the nextPage button it runs a script that loads the next bunch of products without actually moving to another page.

So when I run the next loop all that happens is that I get the same content of the first one, even when the ones on the browser I'm emulating itself is different.

This is the code I run:

from selenium import webdriver 
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

driver.get("https://www.my-website.com/search/results-34y1i") 
soup = BeautifulSoup(driver.page_source, 'html.parser')  
time.sleep(2)

#     ///////////       code to find total number of pages
currentPage = 0
button_NextPage = driver.find_element(By.ID, 'nextButton')

while currentPage != totalPages:
#    /////////       code to find the products
    currentPage += 1
    button_NextPage = driver.find_element(By.ID, 'nextButton')
    button_NextPage.click()
    time.sleep(5)

Is there any way for me to scrape exactly what's loaded on my browser?

Does this answer your question? [How to scrape dynamic content from a website?](https://stackoverflow.com/questions/55709463/how-to-scrape-dynamic-content-from-a-website) — crock, Jun 28 '22 at 23:11

SaC-SeBaS · Accepted Answer · 2022-06-28T23:37:39.423

2

The issue it seems to be because you're just fetching the page 1 as shown in the next line:

driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=1&view=grid")

But as you can see there's a query parameter called page in the url that determines which html's page you are fetching. So what you'll have to do is every time you're looping to a new page you'll have to fetch the new html content with the driver by changing the page query parameter. For example in your loop it will be something like this:

driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page={page}&view=grid".format(page = currentPage))

And after you fetch the new html structure you'll be able to access to the new elements that are present in the differente pages as you require.

edited Jun 28 '22 at 23:37

answered Jun 28 '22 at 23:12

SaC-SeBaS

174
8

Doesn't seem like this is a loading issue. I tried setting up bigger wait times to see if anything changed but it always got me the same result. – small_potato Jun 28 '22 at 23:18
Can you provide the exact url that you're using so I can test it by my own?. – SaC-SeBaS Jun 28 '22 at 23:21
Of course: https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=1&view=grid – small_potato Jun 28 '22 at 23:23
1

As a query parameter they are passing the page that you already found as a `&page=2` in the [url](https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=2&view=grid), so what you'll have to do is every time you loop to a new page is change the query parameter of the `page` to the next number for example: `driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page={page}&view=grid".format(page = page_number)) `. – SaC-SeBaS Jun 28 '22 at 23:32
1

Check my updated answer it is more accurate than the last one. – SaC-SeBaS Jun 28 '22 at 23:38

How to scrape a page that is dynamicaly locaded?

1 Answers1