I've read various SO threads on why it takes so long (or hangs) while generating/injecting/parsing/fetching, but to no luck. The solutions in the following SO threads I've tried implementing, but no luck.
1) Nutch 2.1 urls injection takes forever
2) Nutch 2.2.1 doesnt continue after Injector job
and various other threads.
I'm using Nutch2.3.1 and HBase0.94.27. I've been following this and this tutorial and I was able to build successfully. But when I fire any nutch commands, it hangs up.
Following are the logs I get while firing these commands:-
Inject Command
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
Generate Command
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
Fetch command
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Parse Command
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming: false
ParserJob: forced reparse: false
ParserJob: parsing all
Update Command
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
Following is the HBase logs:-
client /0:0:0:0:0:0:0:1:45216
2016-05-04 10:00:47,214 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000e, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,215 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:45216 which had sessionid 0x1547b2be4bc000e
2016-05-04 10:00:47,215 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000d, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,216 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:59934 which had sessionid 0x1547b2be4bc000d
2016-05-04 10:01:10,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000c, timeout of 40000ms exceeded
2016-05-04 10:01:10,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000c
2016-05-04 10:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000b, timeout of 40000ms exceeded
2016-05-04 10:01:22,003 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000b
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000e, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000d, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000e
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000d
2016-05-04 10:02:25,195 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59938
2016-05-04 10:02:25,202 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59938
2016-05-04 10:02:25,204 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc000f with negotiated timeout 40000 for client /127.0.0.1:59938
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59940
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59940
2016-05-04 10:02:25,825 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc0010 with negotiated timeout 40000 for client /127.0.0.1:59940
2016-05-04 10:04:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
2016-05-04 10:04:28,372 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@25e5c862
2016-05-04 10:04:28,379 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 0 catalog row(s) and gc'd 0 unreferenced parent region(s)
2016-05-04 10:09:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
Hadoop.log
2016-05-04 10:42:18,132 INFO crawl.InjectorJob - InjectorJob: starting at 2016-05-04 10:42:18
2016-05-04 10:42:18,134 INFO crawl.InjectorJob - InjectorJob: Injecting urlDir: seed/urls.txt
2016-05-04 10:42:18,527 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
What exactly is the problem. I have configured everything correctly and it still hangs up. Why