It seems that the request header is being critically scrutinized. I have played a bit with the request header, and e.g. this was a successful request at the time of writing this answer:
import requests
url_path = r'https://www.ticketweb.com/search?q='
HEADERS = {
"Accept-Language": "en-US,en",
"Accept": "*/*;q=0.9",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
}
response = requests.get(url_path, headers=HEADERS)
response.raise_for_status()
print(response.text)
Here is a good explanation about the q
-Parameter in the request header. tldr; (as far as I understood this) It indicates that the instruction is not handled quite so strictly, which you accept as the requester.
I came to the solution by copying the complete request header from a firefox request and tried to minimize it as far as I can, also played a bit with the q
-Parameter as already mentioned.
EDIT: In the meanwhile this request is not working anymore
Important note
If you read the terms of use on the page, you will see something like this:
[...] you agree that you will not:
- Use any robot, spider [...]
- Use any automated software or computer system to search for [...]
So it is very likely that the site owners analyzing some criteria to see if a request is made from a browser or from a machine. If they assume that a computer program is accessing the site, they can block or manipulate the response (e.g. returning an empty result or returning an arbitrary status code like 506 or even 418 if they want).
That means: Web scraping can fail at any time. Especially if the site owners don't want you to download their content automatically, because site operators can always come up with new things to prevent automated access.
If you are allowed to download the content, you will have to do more work, e.g. use selenium web driver, consider cookies, humanize the request times and maybe not always use the same IP address for the automated accesses, using caches from the site etc.
This is hard to do with purely the requests
library or only using curl
. So instead of faking a human request, why not using a browser and doing the request for you?
Here is an example how to request via Selenium's Browser. This should work for url https://www.ticketweb.com/search?q=taylor+swift
and driver.find_element(by=By.TAG_NAME, value="body")
. The browser can also be used headless by injecting --headless
to the browser options, so no need to see the browser UI during the process.
But again: Web scraping can fail at any time and please read carefully the terms of use if you are allowed to read the page automatically at all.
BTW: utf-8
is not listed as Accept-Encoding
parameter here. But it seems that you don't need it anyways.