Why is urlopen not working for certain websites?

Question

I am very new to python and I am trying to webscrape some basic data from a client's website. I have tried this exact same method with other websites and received the expected results. This is what I have so far:

from urllib.request import urlopen
from bs4 import BeautifulSoup

main_url = 'https://www.grainger.com/category/pipe-hose-tube-fittings/hose-products/hose-fittings-couplings/cam-groove-fittings-gaskets/metal-cam-groove-fittings/stainless-steel-cam-groove-fittings'

uClient = urllib.request.urlopen(main_url)
main_html = uClient.read()
uClient.close()

Even this simple call to read a website is causing what appears to be a timeout error. As I said I have used this exact same code successfully on other websites. The error is:

Traceback (most recent call last):
  File "Pricing_Tool.py", line 6, in <module>
    uClient = uReq(main_url)
  File "C:\Users\Brian Knoll\anaconda3\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Brian Knoll\anaconda3\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\Brian Knoll\anaconda3\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\Users\Brian Knoll\anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\Brian Knoll\anaconda3\lib\urllib\request.py", line 1362, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\Brian Knoll\anaconda3\lib\urllib\request.py", line 1322, in do_open
    r = h.getresponse()
  File "C:\Users\Brian Knoll\anaconda3\lib\http\client.py", line 1344, in getresponse
    response.begin()
  File "C:\Users\Brian Knoll\anaconda3\lib\http\client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "C:\Users\Brian Knoll\anaconda3\lib\http\client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\Brian Knoll\anaconda3\lib\socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\Brian Knoll\anaconda3\lib\ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\Brian Knoll\anaconda3\lib\ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

Is it possible that this website is just too large to process? Any help would be GREATLY appreciated. Thanks!

bigbounty · Accepted Answer · 2020-07-09T08:42:21.417

Usually websites return a response on sending a request via requests. But there are some websites that need some specific headers like User-Agent, Cookie etc. This is one such website. You have send the User-Agent so that website sees that the request is coming from a browser. The following code should return response code 200.

import requests
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = requests.get("https://www.grainger.com/category/pipe-hose-tube-fittings/hose-products/hose-fittings-couplings/cam-groove-fittings-gaskets/metal-cam-groove-fittings/stainless-steel-cam-groove-fittings", headers=headers)
print(res.status_code)

Update:

from bs4 import BeautifulSoup
soup = BeautifulSoup(res.text, "lxml")
print(soup.find_all("a"))

This would give all the anchor tags

Why is urlopen not working for certain websites?

1 Answers1