EDIT: I decided to run this as headless=False to see what's happening. Reddit is giving me the "Reddit.com wants to show notifications" and it looks like that's causing the hang-up. Does anyone know how to get around that?
I'm working on my capstone project with some classmates and the synopsis of our project is that we have to scrape & compare data from Reddit, Google, and some other sites. We're most likely using PRAW for Reddit but I want to use Pyppeteer to do the rest of the work. I followed a tutorial that I found online that demonstrated scraping data from Wikipedia. It works great on Wiki but when I tried to make a proof of concept using a subreddit I get this timeout error: pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 80000 ms exceeded.
from pyppeteer import launch
from bs4 import BeautifulSoup
import asyncio
async def main(keyword):
# Create a browser instance and goto Reddit
browser = await launch()
page = await browser.newPage()
await page.goto('https://www.reddit.com/r/GameDeals/')
# Type in search keyword and press enter
await page.type('[id=header-search-bar]', keyword)
await page.screenshot({'path': 'main_page.png'})
# Press enter and wait for results to load
await page.keyboard.press('Enter')
await page.waitForNavigation(timeout=80000) #issue is this line
await page.screenshot({'path': 'results.png'})
await browser.close()
asyncio.get_event_loop().run_until_complete(main("steam"))
As you can see, I cranked up the timeout period to 80000 ms which is still not enough time apparently. It's definitely not my internet - I'm able to navigate to and search Reddit just fine. I'm on Mac OS using VS Code. Pyppeteer is using a headless instance of Chromium. This is my first time doing any kind of website scraping, so if anyone has any troubleshooting suggestions I'd greatly appreciate it!