2

I am trying to use Nutch 1.14 for crawling a website. There are some web pages on which content is loaded through ajax. I am trying to integrate interactive selenium plugin to handle some js functionality to fetch dynamic data.

As per documentation, i made below changes :

    #Added SearchHandler in Interactive-selenium plugin directory 
     public class SearchHandler implements InteractiveSeleniumHandler {}

    #Added below conf in nutch-site.xml
    <property>
      <name>plugin.includes</name>
     <value>protocol-interactiveselenium|protocol-(file|http)|urlfilter-regex|parse-(html|tika|text|metatags)|index-(static|basic|anchor|metadata|more)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
     <description>
     </description>
   </property>

        <property>
          <name>interactiveselenium.handlers</name>
          <value>SearchHandler,DefaultHandler</value>
          <description></description>
        </property>

It is invoking browser for some random urls. Not sure why it is not triggered for every crawled url. What i am doing wrong?

Rajeev
  • 4,762
  • 8
  • 41
  • 63

0 Answers0