There's a website that I need to crawl, I have no financial purpose just to study.
I checked the robots.txt
and it was as follows.
User-agent: *
Allow: /
Disallow: /*.notfound.html
Can I crawl this website using request
and beautifulSoup
?
I checked that crawling without a header causes a 403 error. Does this mean that crawling is not allowed?