I am trying to crawl the webpage https://sec.report/, which seems to be protected by a certain server configuration. (I need the data for my master thesis).
I have a list of company names, which I would like to get certain identifiers (CIK) from the above website.
Landauer Inc --> 0000825410
.
Starwood Waypoint Homes --> 0001579471
.
Supreme Industries Inc --> 0000350846
.
[and 2,000 more ...]
Example: Searching for the first entry in the latter list (Landauer Inc), I can get the CIK using the following link: https://sec.report/CIK/Search/Landauer%20Inc. The generic link is https://sec.report/CIK/Search/{company_name}.
Problem: When I send a simple request (Python) to the above URL, I get an HTTP 200 response. Yet, I only get shown a website saying: Please wait up to 5 seconds.... Please see the response here:
Loading page when request is sent.
I assume the website is protected by Cloudfare due to https://checkforcloudflare.selesti.com/?q=https://sec.report/
Try-outs: I have already tried to crawl the page using Python with:
(1) Tor-proxies with full request headers (rotating).
(2) Selenium including Cloudfare packages/extensions.
(3) Simple scrapy spider (I've never used scrapy so that I could have missed a working solution)
Does someone of you have an idea how I could bypass the protection to crawl the necessary data?
Thanks a lot in advance!