0

So here's my problem. I wrote a program that is perfectly able to get all of the information I want on the first page that I load. But when I click on the nextPage button it runs a script that loads the next bunch of products without actually moving to another page.

So when I run the next loop all that happens is that I get the same content of the first one, even when the ones on the browser I'm emulating itself is different.

This is the code I run:

from selenium import webdriver 
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

driver.get("https://www.my-website.com/search/results-34y1i") 
soup = BeautifulSoup(driver.page_source, 'html.parser')  
time.sleep(2)

#     ///////////       code to find total number of pages
currentPage = 0
button_NextPage = driver.find_element(By.ID, 'nextButton')

while currentPage != totalPages:
#    /////////       code to find the products
    currentPage += 1
    button_NextPage = driver.find_element(By.ID, 'nextButton')
    button_NextPage.click()
    time.sleep(5)

Is there any way for me to scrape exactly what's loaded on my browser?

  • Does this answer your question? [How to scrape dynamic content from a website?](https://stackoverflow.com/questions/55709463/how-to-scrape-dynamic-content-from-a-website) – crock Jun 28 '22 at 23:11

1 Answers1

2

The issue it seems to be because you're just fetching the page 1 as shown in the next line:

driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=1&view=grid")

But as you can see there's a query parameter called page in the url that determines which html's page you are fetching. So what you'll have to do is every time you're looping to a new page you'll have to fetch the new html content with the driver by changing the page query parameter. For example in your loop it will be something like this:

driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page={page}&view=grid".format(page = currentPage))

And after you fetch the new html structure you'll be able to access to the new elements that are present in the differente pages as you require.

SaC-SeBaS
  • 174
  • 8
  • Doesn't seem like this is a loading issue. I tried setting up bigger wait times to see if anything changed but it always got me the same result. – small_potato Jun 28 '22 at 23:18
  • Can you provide the exact url that you're using so I can test it by my own?. – SaC-SeBaS Jun 28 '22 at 23:21
  • Of course: https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=1&view=grid – small_potato Jun 28 '22 at 23:23
  • 1
    As a query parameter they are passing the page that you already found as a `&page=2` in the [url](https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=2&view=grid), so what you'll have to do is every time you loop to a new page is change the query parameter of the `page` to the next number for example: `driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page={page}&view=grid".format(page = page_number)) `. – SaC-SeBaS Jun 28 '22 at 23:32
  • 1
    Check my updated answer it is more accurate than the last one. – SaC-SeBaS Jun 28 '22 at 23:38