2

I configured a local Nutch 2.3.1 instance on MacOS 10.11.5 (El Capitan) running in Eclipse as described here: https://wiki.apache.org/nutch/RunNutchInEclipse

As data store to use I configured MongoDB 2.6.12 which is also running on my local MacOS machine. I took the Gora config from here: http://www.aossama.com/search-engine-with-apache-nutch-mongodb-and-elasticsearch/

ivy.xml

<dependency org="org.apache.gora" name="gora-mongodb" rev="0.6.1" conf="*->default" />

gora.properties

gora.datastore.default=org.apache.gora.mongodb.store.MongoStore
gora.mongodb.override_hadoop_configuration=false
gora.mongodb.mapping.file=/gora-mongodb-mapping.xml
gora.mongodb.servers=localhost:27017
# I tried several server settings like localhost, 127.0.0.1, 127.0.0.1:27017, ...
gora.mongodb.db=nutch

I did not change gora-mongodb-mapping.xml.

nutch-site.xml

<property>
 <name>storage.data.store.class</name>
 <value>org.apache.gora.mongodb.store.MongoStore</value>
 <description>Default class for storing data</description>
</property>

If I run the inject command, hadoop.log shows this confusing result:

2016-07-12 23:23:16,818 INFO  crawl.InjectorJob - InjectorJob: starting at 2016-07-12 23:23:16
2016-07-12 23:23:16,819 INFO  crawl.InjectorJob - InjectorJob: Injecting urlDir: /Users/myaccount/Documents/Nutch/urls
2016-07-12 23:23:17,054 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-12 23:23:17,416 ERROR store.MongoStore - 
2016-07-12 23:23:17,417 ERROR store.MongoStore - [Ljava.lang.StackTraceElement;@4b5189ac
2016-07-12 23:23:17,418 ERROR store.MongoStore - Error while initializing MongoDB store: java.lang.NullPointerException
2016-07-12 23:23:17,419 ERROR crawl.InjectorJob - InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
    at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
    at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:267)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:290)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:299)
Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
    at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:131)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
    ... 7 more
Caused by: java.io.IOException: java.lang.NullPointerException
    at org.apache.gora.mongodb.store.MongoMappingBuilder.fromFile(MongoMappingBuilder.java:123)
    at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:118)
    ... 9 more
Caused by: java.lang.NullPointerException
    at org.apache.gora.mongodb.store.MongoMapping.newDocumentField(MongoMapping.java:109)
    at org.apache.gora.mongodb.store.MongoMapping.addClassField(MongoMapping.java:169)
    at org.apache.gora.mongodb.store.MongoMappingBuilder.loadPersistentClass(MongoMappingBuilder.java:169)
    at org.apache.gora.mongodb.store.MongoMappingBuilder.fromFile(MongoMappingBuilder.java:112)
    ... 10 more

After two days I've run out of ideas.

Within the log file I can't identify any valuable hint. The MongoDB logs don't show any connection attempts (not to mention an active connection). Using mongo I'm able to connect to the database and requesting http://localhost:27017 shows the expected message ("It looks like you are trying to access MongoDB over HTTP on the native driver port.") and corresponding log file entries. If I switch the data store to Cassandra, injecting works as expected, so Nutch itself also seems to work.

Does anybody know what I'm missing or understand what the hadoop.log is trying to tell me?

Any help would be appreciated! Thx.

Update: I also tried to use this configuration on an Ubuntu 14.04 server - works as expected. So I suppose my issue is related to the connection between Nutch & MongoDB running on a Mac. (If somebody wants to know: I'm trying to get the configuration working on my Mac because I want to do some local development with no need of a server connection.)

André
  • 477
  • 3
  • 11
  • Hi Andre, are you able to figure out this issue? i am getting same on ubuntu-16.04 – Rajni Kant Sharma Sep 23 '16 at 20:30
  • No, sorry. As I mentioned above, on Ubuntu 14.04.4 LTS it works like a charme, so I continue working on the server. If you can find a solution, I would still appreciate a hint. ;-) – André Sep 26 '16 at 06:26
  • Did you try debugging this in eclipse. I have a feeling that they've added some fields that have not been added to the mongo mapping file. – jvence May 15 '17 at 16:25
  • 1
    Hello Andre, I had a different issue but it turns out that the gora-mongodb 0.6.1 uses an old mongo connector which causes the issue. Try using `` in your ivy.xml and recompile nutch and try again. – Abhay Pai Jun 09 '17 at 12:59
  • Thx for this hint. Currently I don't have a running setup at my Mac, but I will give it a try! – André Jun 23 '17 at 08:24
  • I don't know if you are still trying to get nutch to crawl with mongodb on your mac. I got mine working using the latest mongodb fo mac: fastdl.mongodb.org/osx/mongodb-osx-ssl-x86_64-3.6.2.tgz – echew Feb 07 '18 at 19:30

0 Answers0