1

I am trying to read a list of seed urls from a csv file and loading them into the crawl controller using the codes below:

public class BasicCrawlController {

    public static void main(String[] args) throws Exception {

        ArrayList<String> sl = Globals.INSTANCE.getSeeds();
        System.out.println("Seeds to add: " + sl.size());
        for (int i = 0; i < sl.size(); i++) {
            String url = sl.get(i).toString();
            System.out.println("Adding to seed: " + url);
            controller.addSeed(url);
        }
    controller.start(BasicCrawler.class, numberOfCrawlers);
    }
}

The output I received from the console is as below:

Seeds to add: 3
Adding to seed: http://xxxxx.com
Adding to seed: http://yyyyy.com
Adding to seed: http://zzzzz.com
 INFO [main] Crawler 1 started.
 INFO [main] Crawler 2 started.
 INFO [main] Crawler 3 started.
 INFO [main] Crawler 4 started.
 INFO [main] Crawler 5 started.
 INFO [main] Crawler 6 started.
 INFO [main] Crawler 7 started.
 INFO [main] Crawler 8 started.
 INFO [main] Crawler 9 started.
 INFO [main] Crawler 10 started.
ERROR [Crawler 1] String index out of range: -8, while processing: http://yyyyy.com/
ERROR [Crawler 1] String index out of range: -8, while processing: http://zzzzz.com/
 INFO [Thread-2] It looks like no thread is working, waiting for 10 seconds to make sure...
 INFO [Thread-2] No thread is working and no more URLs are in queue waiting for another 10 seconds to make sure...
 INFO [Thread-2] All of the crawlers are stopped. Finishing the process...
 INFO [Thread-2] Waiting for 10 seconds before final clean up...

Am I missing something to allow dynamic adding of seeds before launching controller.start ?

The rest of the specification of amount of crawlers and all the necessary stuff for crawler4j in the crawl controller has been omitted from the above codes to make it short and easy to read.

thotheolh
  • 7,040
  • 7
  • 33
  • 49
  • I removed the dynamic adding of seeds to the codes and simply addSeeds("..."); for every single link and the same error occurred. This is the first time an error occurred. It usually works out of the box. I am working if some data corruption occurred ? – thotheolh Aug 20 '13 at 05:08
  • I think I found the problem. I was meddling with the shouldVisit() and some error occurred but the message wasn't strong enough. Thanks for reading btw. – thotheolh Aug 20 '13 at 05:12

0 Answers0