I tried modifying the code crawler4j-Quickstart example
I want to crawl the following link
https://www.google.com/search?biw=1366&bih=645&tbm=nws&q=%22obama%22&oq=%22obama%22&gs_l=serp.3..0l5.825041.826084.0.826833.5.5.0.0.0.0.187.572.2j3.5.0....0...1c.1.64.serp..0.3.333...0i13k1.Tmd9nARKIrU
which is a Google news search link with the keyword obama
I tried modifying mycrawler.java
@Override
public boolean shouldVisit(Page referringPage, WebURL url) {
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches()
&& href.startsWith("https://www.google.com/search?biw=1366&bih=645&tbm=nws&q=%22obama%22&oq=%22obama%22&gs_l=serp.3..0l5.825041.826084.0.826833.5.5.0.0.0.0.187.572.2j3.5.0....0...1c.1.64.serp..0.3.333...0i13k1.Tmd9nARKIrU/");
}
Also, controller.java
/*
* For each crawl, you need to add some seed urls. These are the first
* URLs that are fetched and then the crawler starts following links
* which are found in these pages
*/
//controller.addSeed("http://www.ics.uci.edu/~lopes/");
// controller.addSeed("http://www.ics.uci.edu/~welling/");
controller.addSeed("https://www.google.com/search?biw=1366&bih=645&tbm=nws&q=%22obama%22&oq=%22obama%22&gs_l=serp.3..0l5.825041.826084.0.826833.5.5.0.0.0.0.187.572.2j3.5.0....0...1c.1.64.serp..0.3.333...0i13k1.Tmd9nARKIrU");
/*
* Start the crawl. This is a blocking operation, meaning that your code
* will reach the line after this only when crawling is finished.
*/
controller.start(MyCrawler.class, numberOfCrawlers);
Then, it shows an error
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
BUILD SUCCESSFUL (total time: 43 seconds)
Is my code modification wrong?
update
I tried to use other url other than google search link .It works. I m guessing it cannot crawl the google search link .Any idea to tackle it ?