2

I am trying to connect to azure cosmos db using mongo db api (spark mongo db connector )to export data to hdfs but I get the below exception:

Below is the complete stacktrace:

{ "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "Command is not supported", "$err" : "Command is not supported" }
at com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:115)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:107)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:187)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:179)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:92)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:85)
at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
at com.mongodb.Mongo.execute(Mongo.java:810)
at com.mongodb.Mongo$2.execute(Mongo.java:797)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:137)
at com.mongodb.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:131)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2$$anonfun$4.apply(MongoSplitVectorPartitioner.scala:76)
at scala.util.Try$.apply(Try.scala:192)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:76)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner$$anonfun$partitions$2.apply(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
at com.mongodb.spark.rdd.partitioner.MongoSplitVectorPartitioner.partitions(MongoSplitVectorPartitioner.scala:75)
at com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:182)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)

Maven dependency added :

<dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>

Code :

SparkSession spark = SparkSession.builder()
                .getOrCreate();

        jsc = new JavaSparkContext(spark.sparkContext());
        HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(jsc);
        Dataset<Row> implicitDS = MongoSpark.load(jsc).toDF();

FYI :

implicitDS.count() gives 0

I am using MongoSplitVectorPartitioner. Updated with the complete stacktrace.

Stennie
  • 63,885
  • 14
  • 149
  • 175
  • Please paste the code you are using to connect – Nick Chapsas Jan 09 '19 at 08:15
  • And if use MongoShardedPartition my application(spark job) runs fine but no data gets exported to hdfs – shatabdi Mukherjee Jan 09 '19 at 11:21
  • Looks like one of the commands you were using isn't supported. Some small subset of methods in Mongo aren't fully supported on Cosmos. Reach out to AskCosmosDB@microsoft.com and the Mongo API engineering team can help you out. – Chris Anderson Jan 09 '19 at 20:36
  • Thanks.But one question .Can we use the spark azure connector to export data in java. – shatabdi Mukherjee Jan 10 '19 at 01:30
  • CosmosDB is a distinct implementation from MongoDB so server commands and behaviour may differ. It looks like the `splitVector`command is not supported so you need to choose a different partitioner approach. I wouldn't expect `MongoShardedPartioner` to work because Cosmos uses a different partitioning scheme. Try `MongoSamplePartitioner`. If that doesn't work, `MongoPaginateByCountPartitioner` or `MongoPaginateBySizePartitioner` should be possible (but slower) options. – Stennie Jan 10 '19 at 09:24
  • None of it works.dataset.printSchema() prints the correct schema but dataset.show() displays a table with proper columns but no data.No exception thrown. – shatabdi Mukherjee Jan 10 '19 at 13:24
  • This is happening to me "randomly" with the same code, same query, and same database. Sometimes, it works, sometimes it fails. It's as if it is a service issue and nothing wrong with the code. The fact that this question is very recent also suggests this. – Florian Winter Jan 10 '19 at 15:39
  • It is also somewhat strange that the response shown in the question is returned as a result of a query and from a cursor iterating over the query result. If it is an error, then shouldn't the query command fail, rather than returning a valid cursor? (observed with the C++ legacy driver, so take this with a grain of salt...) – Florian Winter Jan 10 '19 at 15:43
  • @shatabdiMukherjee Have you tried doing the same query / command with a third-party tool, such as Robo 3T, rather than your own code? (https://robomongo.org/). Whenever I do that, it works, while my own code, which does the same (and successfully in 50% of all cases), is failing. – Florian Winter Jan 10 '19 at 15:51
  • @FlorianWinter Are you also using Cosmos DB and the Spark Connector? If not, I'd suggest posting a separate question with details of your environment and a code snippet. The underlying issue in this question is that Cosmos does not implement some of the expected commands or behaviour that would be available with a MongoDB server. A straightforward fix would be to use an actual MongoDB deployment hosted on Azure. – Stennie Jan 11 '19 at 11:31
  • @Stennie Done: https://stackoverflow.com/questions/54143907/azure-cosmosdb-operation-not-supported-when-using-elemmatch-and-in Please disregard my comments here if they are unrelated, as I'm not using the Spark Connector. Apologies for any confusion caused and time wasted. – Florian Winter Jan 11 '19 at 12:01
  • My requirement is to implement an adapter to export data from cosmos db to hdfs. – shatabdi Mukherjee Jan 11 '19 at 15:52

0 Answers0