I am trying to scrape https://www.carsireland.ie/search#q?%20scraper%20python=&toggle%5Bpoa%5D=false&page=1 (I had built a scraper but then they did a total overhaul of their website). The new website has a new format and has Cloudflare to provide the usual security. I have the following code which returns a 403 error, particularly referencing this error:
- "https://www.cloudflare.com/5xx-error-landing"
The code which I have built so far is as follows:
from requests_html import HTMLSession
session = HTMLSession()
header = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
'referer':'https://www.google.com/'
}
# url of search page
url = 'https://www.carsireland.ie/search#q?sortBy=vehicles_prod%2Fsort%2Fpoa%3Aasc%2Cupdated%3Adesc&page=1'
# create a session with the url
r = session.get(url, headers=header)
# render the url
data = r.html.render(sleep=1, timeout=20)
# Check the response
print(r.text)
I would really appriciate any help which could be provided to correct the CloudFlare issues which I am having.