2

I am trying to scrape https://www.carsireland.ie/search#q?%20scraper%20python=&toggle%5Bpoa%5D=false&page=1 (I had built a scraper but then they did a total overhaul of their website). The new website has a new format and has Cloudflare to provide the usual security. I have the following code which returns a 403 error, particularly referencing this error:

  • "https://www.cloudflare.com/5xx-error-landing"

The code which I have built so far is as follows:

from requests_html import HTMLSession

session = HTMLSession()

header = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
    'referer':'https://www.google.com/'
}

# url of search page
url = 'https://www.carsireland.ie/search#q?sortBy=vehicles_prod%2Fsort%2Fpoa%3Aasc%2Cupdated%3Adesc&page=1'

# create a session with the url
r = session.get(url, headers=header)

# render the url
data = r.html.render(sleep=1, timeout=20)

# Check the response
print(r.text)

I would really appriciate any help which could be provided to correct the CloudFlare issues which I am having.

MrSwan
  • 73
  • 4
  • `403` corresponds with [Forbidden](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403) status. Can you elaborate as to how you know that this isn't intentional behavior by the target site's Cloudflare configuration to prevent this type of automated scraping...? – esqew Dec 27 '21 at 02:51

1 Answers1

-1

this problem can be fixed by simply changing the referer property in header to the link you are going to scrape.

Nachat Ayoub
  • 359
  • 2
  • 8