2

I've read various SO threads on why it takes so long (or hangs) while generating/injecting/parsing/fetching, but to no luck. The solutions in the following SO threads I've tried implementing, but no luck.

1) Nutch 2.1 urls injection takes forever

2) Nutch 2.2.1 doesnt continue after Injector job

and various other threads.

I'm using Nutch2.3.1 and HBase0.94.27. I've been following this and this tutorial and I was able to build successfully. But when I fire any nutch commands, it hangs up.

Following are the logs I get while firing these commands:-

Inject Command

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt 
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt

Generate Command

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40

Fetch command

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1

Parse Command

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: parsing all

Update Command

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all

Following is the HBase logs:-

 client /0:0:0:0:0:0:0:1:45216
2016-05-04 10:00:47,214 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000e, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,215 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:45216 which had sessionid 0x1547b2be4bc000e
2016-05-04 10:00:47,215 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000d, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,216 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:59934 which had sessionid 0x1547b2be4bc000d
2016-05-04 10:01:10,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000c, timeout of 40000ms exceeded
2016-05-04 10:01:10,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000c
2016-05-04 10:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000b, timeout of 40000ms exceeded
2016-05-04 10:01:22,003 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000b
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000e, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000d, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000e
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000d
2016-05-04 10:02:25,195 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59938
2016-05-04 10:02:25,202 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59938
2016-05-04 10:02:25,204 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc000f with negotiated timeout 40000 for client /127.0.0.1:59938
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59940
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59940
2016-05-04 10:02:25,825 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc0010 with negotiated timeout 40000 for client /127.0.0.1:59940
2016-05-04 10:04:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
2016-05-04 10:04:28,372 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@25e5c862
2016-05-04 10:04:28,379 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 0 catalog row(s) and gc'd 0 unreferenced parent region(s)
2016-05-04 10:09:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN

Hadoop.log

2016-05-04 10:42:18,132 INFO  crawl.InjectorJob - InjectorJob: starting at 2016-05-04 10:42:18
2016-05-04 10:42:18,134 INFO  crawl.InjectorJob - InjectorJob: Injecting urlDir: seed/urls.txt
2016-05-04 10:42:18,527 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

What exactly is the problem. I have configured everything correctly and it still hangs up. Why

Community
  • 1
  • 1
Praful Bagai
  • 16,684
  • 50
  • 136
  • 267
  • Have you tried disabling ipv6 and use only ipv4 in Hadoop + HBase? – Alfonso Nishikawa May 04 '16 at 18:09
  • I'm now using Mongo as a replacement for HBase, reason being Hadoop+HBase being too complex. Thanks for the comment. – Praful Bagai May 05 '16 at 14:04
  • I'm now getting an error while indexing the data into the ES. Here's the question. http://stackoverflow.com/questions/37052674/nutch-elasticsearch-integration . Can you please provide any info on the same. – Praful Bagai May 05 '16 at 14:05

0 Answers0