0

I'm trying to crawl a wordpress website with Heritrix, and I have provided cookies to automatically login to the website and crawl, it works fine but after crossing 20MB (approx. 10 minutes) of downloaded data or so, the website logs out and the crawler can't continue is there any way to crawl the website without the cookies expire ? and what is the problem or the cause of this ? I tried very polite option in heritrix to alter the crawling pattern, but It doesn't work.

0 Answers0