Unable to connect to Azure Cosmos Db using mongo db api

Question

I am trying to connect to azure cosmos db using mongo db api (spark mongo db connector )to export data to hdfs but I get the below exception:

Below is the complete stacktrace:

{ "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "Command is not supported", "$err" : "Command is not supported" }
at com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:115)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:107)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:187)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:179)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:92)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:85)
at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
at com.mongodb.Mongo.execute(Mongo.java:810)
at com.mongodb.Mongo$2.execute(Mongo.java:797)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:137)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:131)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at scala.util.Try$.apply(Try.scala:192)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner.partitions(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:182)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)

Maven dependency added :

<dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>

Code :

SparkSession spark = SparkSession.builder()
                .getOrCreate();

        jsc = new JavaSparkContext(spark.sparkContext());
        HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(jsc);
        Dataset<Row> implicitDS = MongoSpark.load(jsc).toDF();

FYI :

implicitDS.count() gives 0

I am using MongoSplitVectorPartitioner. Updated with the complete stacktrace.

And if use MongoShardedPartition my application(spark job) runs fine but no data gets exported to hdfs — shatabdi Mukherjee, Jan 09 '19 at 11:21
Looks like one of the commands you were using isn't supported. Some small subset of methods in Mongo aren't fully supported on Cosmos. Reach out to AskCosmosDB@microsoft.com and the Mongo API engineering team can help you out. — Chris Anderson, Jan 09 '19 at 20:36
Thanks.But one question .Can we use the spark azure connector to export data in java. — shatabdi Mukherjee, Jan 10 '19 at 01:30
CosmosDB is a distinct implementation from MongoDB so server commands and behaviour may differ. It looks like the `splitVector`command is not supported so you need to choose a different partitioner approach. I wouldn't expect `MongoShardedPartioner` to work because Cosmos uses a different partitioning scheme. Try `MongoSamplePartitioner`. If that doesn't work, `MongoPaginateByCountPartitioner` or `MongoPaginateBySizePartitioner` should be possible (but slower) options. — Stennie, Jan 10 '19 at 09:24
None of it works.dataset.printSchema() prints the correct schema but dataset.show() displays a table with proper columns but no data.No exception thrown. — shatabdi Mukherjee, Jan 10 '19 at 13:24
This is happening to me "randomly" with the same code, same query, and same database. Sometimes, it works, sometimes it fails. It's as if it is a service issue and nothing wrong with the code. The fact that this question is very recent also suggests this. — Florian Winter, Jan 10 '19 at 15:39
It is also somewhat strange that the response shown in the question is returned as a result of a query and from a cursor iterating over the query result. If it is an error, then shouldn't the query command fail, rather than returning a valid cursor? (observed with the C++ legacy driver, so take this with a grain of salt...) — Florian Winter, Jan 10 '19 at 15:43
@shatabdiMukherjee Have you tried doing the same query / command with a third-party tool, such as Robo 3T, rather than your own code? (https://robomongo.org/). Whenever I do that, it works, while my own code, which does the same (and successfully in 50% of all cases), is failing. — Florian Winter, Jan 10 '19 at 15:51
@FlorianWinter Are you also using Cosmos DB and the Spark Connector? If not, I'd suggest posting a separate question with details of your environment and a code snippet. The underlying issue in this question is that Cosmos does not implement some of the expected commands or behaviour that would be available with a MongoDB server. A straightforward fix would be to use an actual MongoDB deployment hosted on Azure. — Stennie, Jan 11 '19 at 11:31
@Stennie Done: https://stackoverflow.com/questions/54143907/azure-cosmosdb-operation-not-supported-when-using-elemmatch-and-in Please disregard my comments here if they are unrelated, as I'm not using the Spark Connector. Apologies for any confusion caused and time wasted. — Florian Winter, Jan 11 '19 at 12:01
My requirement is to implement an adapter to export data from cosmos db to hdfs. — shatabdi Mukherjee, Jan 11 '19 at 15:52

Unable to connect to Azure Cosmos Db using mongo db api

0 Answers0