3

Hey I have code in python playwright for getting page source:

import json
import sys
import bs4
import urllib.parse
from bs4 import BeautifulSoup
server_proxy = urllib.parse.unquote(sys.argv[1])
link = urllib.parse.unquote(sys.argv[2])
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    #browser = p.chromium.launch(headless = False)
    browser = p.chromium.launch(proxy={"server": server_proxy,'username': 'xxx',"password": 'xxx' })
    context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36')
    page = context.new_page()
    cookie_file = open('cookies_tessco.json')
    cookies = json.load(cookie_file)
    context.add_cookies(cookies)
    page.goto(link)
    try:
        page.wait_for_timeout(10000)
        cont = page.content()
        print(cont)
        page.close()
        context.close()
        browser.close()      
    except Exception as e:
        print("Error in playwright script." + page)
        page.close()
        context.close()
        browser.close()      

This works okay, but sometimes I receive this error:

Traceback (most recent call last):
  File "page_tessco.py", line 17, in <module>
    page.goto(link)
  File "/usr/local/lib/python3.9/site-packages/playwright/sync_api/_generated.py", line 5774, in goto
    self._sync(
  File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_sync_base.py", line 103, in _sync
    return task.result()
  File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_page.py", line 464, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_frame.py", line 117, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 47, in inner_send
    result = await callback.future
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
navigating to "https://www.tessco.com/product/207882", waiting until "load"

I tried to add

page.wait_for_timeout(10000)

but still, these errors appear sometimes, any help, also im confused why this error appears only sometimes, what causes this error, if someone has experience please share it?

Vivek S.
  • 19,945
  • 7
  • 68
  • 85

2 Answers2

4

https://www.tessco.com/product/207882 loads quit slow. Try to extend the default timeout of 30000ms adding a timeout to page.goto(link):

page.goto(link, timeout = 0)

With setting timeout to 0 you disable the timeout. Documentation

Alternatively, you can disable timeout with the following:

page.set_default_timeout(0)
page.goto(link)
Habetuz
  • 111
  • 1
  • 6
  • 1
    I did it, however, i still sometimes receive the same error: playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded. –  Jul 07 '21 at 06:54
  • @HHHHHHT I tried your code but could not reproduce the error. I removed the `proxy` parameter from the `p.chromium.launch()` statement. Maybe try that. – Habetuz Jul 07 '21 at 20:47
  • Yeah it appears sometimes, not always. I cannot, because the website block my server ip, thats why u need to use proxies –  Jul 09 '21 at 08:57
  • 1
    Maybe try a high number like `100000` instead of `0`. – Habetuz Jul 10 '21 at 09:18
  • 2
    thanks this one worked for me `page.goto(link, timeout = 0)` – Malki Mohamed Mar 25 '22 at 10:39
  • The `30000ms exceeded` is the default value for the *overall* test timeout, not the navigate timeout. I'm not sure where this is configured in [tag:python] though. – freedomn-m Nov 22 '22 at 17:40
  • `page.goto(link, timeout = 0)` can hang forever, stifling errors. It's OK to set it to a few minutes or even an hour or so, but blocking forever is overkill and never really necessary. If there's something abnormal, I'd want a report of that situation so it can be dealt with. – ggorlen Apr 04 '23 at 16:12
1

Another alternative (for instances in which you sometimes experience timeouts) is to just keep retrying to load the page over and over by using a while loop that only breaks out of the loop if it is successful in its try block. The key here (and something I learned) is the fact that the continue statement in the except block doesn't return any exception, but rather retries the code within the while loop

import sleep

while True:
  try:
    page.goto(link)       
  except:
    sleep(<SLEEP FOR SOME AMOUNT OF SECONDS>)
    continue
  break

The sleep is optional here, but does give your network time to recover if its a networking issue. Also if you want to have max retires (instead of infinite) you can always do:

retries = 1
max_retries = 10
while retries <= max_retries:
  try:
    page.goto(link)       
  except:
    sleep(<SLEEP FOR SOME AMOUNT OF SECONDS>)
    retries += 1
    continue
  break

deesolie
  • 867
  • 7
  • 17