2

EDIT: I decided to run this as headless=False to see what's happening. Reddit is giving me the "Reddit.com wants to show notifications" and it looks like that's causing the hang-up. Does anyone know how to get around that?

I'm working on my capstone project with some classmates and the synopsis of our project is that we have to scrape & compare data from Reddit, Google, and some other sites. We're most likely using PRAW for Reddit but I want to use Pyppeteer to do the rest of the work. I followed a tutorial that I found online that demonstrated scraping data from Wikipedia. It works great on Wiki but when I tried to make a proof of concept using a subreddit I get this timeout error: pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 80000 ms exceeded.

from pyppeteer import launch
from bs4 import BeautifulSoup
import asyncio


async def main(keyword):
    # Create a browser instance and goto Reddit
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://www.reddit.com/r/GameDeals/')

    # Type in search keyword and press enter
    await page.type('[id=header-search-bar]', keyword)

    await page.screenshot({'path': 'main_page.png'})

    # Press enter and wait for results to load
    await page.keyboard.press('Enter')
    await page.waitForNavigation(timeout=80000) #issue is this line

    await page.screenshot({'path': 'results.png'})

    await browser.close()

asyncio.get_event_loop().run_until_complete(main("steam"))

As you can see, I cranked up the timeout period to 80000 ms which is still not enough time apparently. It's definitely not my internet - I'm able to navigate to and search Reddit just fine. I'm on Mac OS using VS Code. Pyppeteer is using a headless instance of Chromium. This is my first time doing any kind of website scraping, so if anyone has any troubleshooting suggestions I'd greatly appreciate it!

fufiy
  • 23
  • 6

1 Answers1

0

I think it is not navigating the page. Because it might be using some front-end technology to render (e.g. react or angular,..). Instead of using page.waitForNavigation, use page.waitForSelector or page.waitForXPath.

Nuno André
  • 4,739
  • 1
  • 33
  • 46
Faizan AlHassan
  • 389
  • 4
  • 8
  • Hey Faizan - I'm not sure what I'd need to put as a param for either of those. I've tried testing both but nothing has worked so far. From reading the documentation, waitForNavigation() makes the most sense because it's waiting for the page to reload - hitting 'enter' in the search bar causes the page to reload. – fufiy Oct 15 '21 at 02:05
  • Upon my understanding page is not reloading, but it is just re-rendering its content using some front-end libraries. usage example: `await page.waitForXPath("//h2[.='New to Foot Locker or Guest']", timeout=80000)` [Page.waitForXPath](https://pyppeteer.github.io/pyppeteer/reference.html#pyppeteer.page.Page.waitForXPath) needs an XPath you are expecting after clicking the search button, [page.waitForSelector](https://pyppeteer.github.io/pyppeteer/reference.html#pyppeteer.page.Page.waitForSelector) needs a selector you are expecting would apear after clicking search button. – Faizan AlHassan Oct 15 '21 at 06:21