So I'm new to web crawling and I'm having trouble understanding a particular robots.txt file. In this case, this is what the website has:
User-agent: *
Allow: /
Sitemap: sitemapURLHere
So I looked up the /
here and found it was for any path. So does this mean that the website allows all pages to be crawled? However, when I try to do a basic crawl on the sitemap.xml (or another site URL) link with scrapy, i.e
scrapy shell siteURL
I get a 403 HTTP
response, which I'm assuming from this link means that the website doesn't want you to scrape... so what exactly does this site's robots.txt
mean?
EDIT The file I am talking about is here