Connecting to Mongo with replica set and mongo-hadoop connector for Spark

Question

I have a Spark process that is currently using the mongo-hadoop bridge (from https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst ) to access the mongo database:

mongo_url = 'mongodb://localhost:27017/db_name.collection_name'
mongo_rdd = spark_context.mongoRDD(mongo_url)

The mongo instance is now being upgraded to a cluster that can only be accessed with a replica set.

How do I create an RDD using the mongo-hadoop connector? The mongoRDD() goes to mongoPairRDD(), which may not take multiple strings.

score 0 · Answer 1 · answered Sep 06 '16 at 08:04

The MongoDB Hadoop Connector mongoRDD can take a valid MongoDB Connection String.

For example, if it's now a replica set you can specify:

mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/?db_name&replicaSet=YourReplicaSetName

1 Answers1