0

I have crawled a site successfully using NUTCH 1.2 .Now I want to integrate this with solr 3.1 . Problem is when I am issuing command $ bin/nutch solrindex localhost:8080/solr/ crawl/crawldb crawl/linkdb cra wl/segments/* an error occurs. I am attaching my nutch logs

Please help me to solve this issue

Bad Request

request: //localhost:8080/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2013-07-08 17:38:47,577 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

DEVANG PANDEY
  • 157
  • 2
  • 15

1 Answers1

0

You'll need to add following Apache Commons library to the classpath: commons-httpclient.jar (you would put it in the same folder where other JARs reside that are used by your nutch installation).

You can find the current version of HttpClient here http://hc.apache.org/httpcomponents-client-ga/

Please note that it is possible that your Nutch version uses an older version of the HttpClient and the current version of the HttpClient is not backward compatible with that older version. In this case you'll need to download that older version of the HttpClient and include that older version within your libs.

bpgergo
  • 15,669
  • 5
  • 44
  • 68
  • thanx for valuable insight . added http client jar but still one error is there ...java.io.IOException: Job failed! – DEVANG PANDEY Jul 08 '13 at 12:11
  • Well, your welcome. This means your problem has been solved, this questions should be closed according to the SO rules. If you have problems resolving the next error (IOException) then you should ask an other question about it. And you __should not__ add your new error message to the original question. This is not how SO works. – bpgergo Jul 08 '13 at 12:15
  • Regarding the IOException, it could be due to lack of internet access. But This really should be discussed in a new Question, this is how Stackoverflow works. – bpgergo Jul 08 '13 at 12:17