I've trying to crawl a website at mystore411.com using open source crawler4j.
The crawler works fine for a limites period of time (say 20-30 seconds) and then website bans my address for few minutes before I can crawl again. I couldn't figure out a possible solutions.
I went through its robots.txt and here is what I got from that:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /js/
Disallow: /css/
Disallow: /images/
User-agent: Slurp
Crawl-delay: 1
User-agent: Baiduspider
Crawl-delay: 1
User-agent: MaxPointCrawler
Disallow: /
User-agent: YandexBot
Disallow: /
Please suggest if there is any alternate.