0

I am running a Crawler4j instance in a Spring boot application and my OpenFeign client is always null.

public class MyCrawler extends WebCrawler {

@Autowired
    HubClient hubClient;

    @Override
    public void visit(Page page) {
// Lots of crawler code...
        if (page.getParseData() instanceof HtmlParseData) {
            hubClient.send(webPage.toString()); // Throws null pointer exception
}
}

My Hubclient

@FeignClient("hub-worker")
public interface HubClient {
    @RequestMapping(method = RequestMethod.POST, value = "/pages", consumes = "application/json")
    void send(String webPage);
    //void createPage(WebPage webPage);
}

My MainApplication

@EnableEurekaClient
@EnableFeignClients
@SpringBootApplication
public class CrawlerApplication {
    public static void main(String[] args) throws Exception {
        SpringApplication.run(CrawlerApplication.class, args);
    }
}

The stacktrace


ext length: 89106
Html length: 1048334
Number of outgoing links: 158
10:14:38.634 [Crawler 164] WARN  e.u.ics.crawler4j.crawler.WebCrawler - Unhandled exception while fetching https://www.cnn.com: null
10:14:38.634 [Crawler 164] INFO  e.u.ics.crawler4j.crawler.WebCrawler - Stacktrace: 
java.lang.NullPointerException: null
    at com.phishspider.crawler.MyCrawler.visit(MyCrawler.java:79)
    at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:523)
    at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:306)
    at java.base/java.lang.Thread.run(Thread.java:834)

Line 79 is the hubClient call. When I factor out the hubVlient into another class that I instantiate in the crawler class like hubclient hc = new hubclient() and then have some method hc.send(page) the hubClient in that factored out class will throw the null pointer.

Nikolai Manek
  • 980
  • 6
  • 16

1 Answers1

2

In order to inject Spring beans (your client) into your crawler4j Web crawler object, you need to instantiate the Web crawler object via Spring.

For this purpose, you need to write a custom implementation of a WebCrawlerFactory, which provides / creates Spring-managed Web crawler objects. To do so, your Web crawler implementation needs to be a Spring Bean, i.e. at least annotated with @Component.

rzo1
  • 5,561
  • 3
  • 25
  • 64