I'm trying to crawl a wordpress website with Heritrix, and I have provided cookies to automatically login to the website and crawl, it works fine but after crossing 20MB (approx. 10 minutes) of downloaded data or so, the website logs out and the crawler can't continue is there any way to crawl the website without the cookies expire ? and what is the problem or the cause of this ? I tried very polite option in heritrix to alter the crawling pattern, but It doesn't work.
Asked
Active
Viewed 37 times
0
-
Please provide enough code so others can better understand or reproduce the problem. – Community Nov 26 '22 at 01:07