1

I'm running Ubuntu 14.04, I'm tying to get a basic Nutch Web Crawl running to no avail. Following this tutorial I set up the following building blocks:

  • Ubuntu 14.04
  • HBase 0.90.4
  • Nutch 2.2.1
  • Solr 4.3.1

I confirm both HBase and Solr is running, I populate the urls/seed.txt file. Then when I call;

bin/nutch inject urls

I'm presented with the following output and then it seems Nutch just hangs.

InjectorJob: starting at 2014-06-09 23:38:49
InjectorJob: Injecting urlDir: urls/seed.txt

This stackoverflow question seems similar to mine, I am however not behind a proxy so the answer is not applicable.

Any help in resolving this issue would be greatly appreciated.

Community
  • 1
  • 1
Frank
  • 590
  • 1
  • 8
  • 23
  • @Andrew-Barber Could you please elaborate on why my question is off-topic. I feel the link to the [tutorial](https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup) in combination with the exact point of failure in said tutorial and the respective Ubuntu and Nutch versions makes the question Minimal, Complete, and Verifiable. – Frank Aug 16 '14 at 00:18

1 Answers1

3

Ubuntu defaults the loopback IP address in hosts to 127.0.1.1. HBase (according to this page) requires your loopback IP address be 127.0.0.1.

The Ubuntu /etc/hosts file by default contains (with myComputerName being your computer name):

127.0.0.1   localhost
127.0.1.1   myComputerName

Use sudo gedit /etc/hosts to update your hosts file as follow:

127.0.0.1   localhost
127.0.0.1   myComputerName

Reboot Ubuntu. Nutch should no longer have trouble injecting urls into HBase.

Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
Frank
  • 590
  • 1
  • 8
  • 23