0

I have a Spark process that is currently using the mongo-hadoop bridge (from https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst ) to access the mongo database:

mongo_url = 'mongodb://localhost:27017/db_name.collection_name'
mongo_rdd = spark_context.mongoRDD(mongo_url)

The mongo instance is now being upgraded to a cluster that can only be accessed with a replica set.

How do I create an RDD using the mongo-hadoop connector? The mongoRDD() goes to mongoPairRDD(), which may not take multiple strings.

Eka
  • 63
  • 2
  • 10

1 Answers1

0

The MongoDB Hadoop Connector mongoRDD can take a valid MongoDB Connection String.

For example, if it's now a replica set you can specify:

mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/?db_name&replicaSet=YourReplicaSetName

See also related information:

Wan B.
  • 18,367
  • 4
  • 54
  • 71