I use spark mongo-connector to sync data from mongodb collection to hdfs file, my code works fine if the collection is read through mongos, but when it comes to local.oplog.rs, a replica collection only could be read through mongod, it gives me exception:
Caused by: com.mongodb.hadoop.splitter.SplitFailedException: Unable to calculate input splits: couldn't find index over splitting key { _id: 1 }
I think the data structure is different between oplog.rs and normal collection, oplog.rs doesn't have "_id" property, so the newAPIHadoopRDD can not work nomally, is that right?