I wanted to read this article online and something popped and I thought that I want to read it offline after I have successfully extracted it... so here I am after 4 weeks of trials and all the problem is down to is I the crawler can't seem to read the content of the webpages even after all of the ruckus...
the initial problem was that all of the info was not present on one page so is used the button to navigate the content of the website itself...
I've tried BeautifulSoup but it can't seem to parse the page very well. I'm using selenium and chromedriver at the moment.
The reason for crawler not being able to read the page seems to be the robot.txt file (the waiting time for crawlers for a single page is 3600 and the article has about 10 pages, which is bearable but what would happen if it were to say 100+)and I don't know how to bypass it or go around it.
Any help??