Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL

Question

There are several similar questions over there on the internet,but no one has answers.

I am using following code to save the mongo data to Hive, but exceptions occur as shown in the end. I would ask how to work around this problem

I am using

spark-mongo-connector (spark 2.1.0 - scala 2.11)

java-mongo-driver 3.10.2

import com.mongodb.spark.MongoSpark
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType

object MongoConnector_Test {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().set("spark.mongodb.input.uri", "mongodb://user:pass@mongo1:123456/db1.t1").setMaster("local[4]").setAppName("MongoConnectorTest")
    val session = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
    val schema: StructType = new StructType().add("_id", "string").add("x", "string").add("y", "string").add("z", "string")//
    val df = MongoSpark.read(session).schema(schema).load()
    df.write.saveAsTable("MongoConnector_Test" + System.currentTimeMillis())
  }

}

But, following exception occurs.

Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL
    at org.bson.json.StrictCharacterStreamJsonWriter.checkState(StrictCharacterStreamJsonWriter.java:395)
    at org.bson.json.StrictCharacterStreamJsonWriter.writeNull(StrictCharacterStreamJsonWriter.java:192)
    at org.bson.json.JsonNullConverter.convert(JsonNullConverter.java:24)
    at org.bson.json.JsonNullConverter.convert(JsonNullConverter.java:21)
    at org.bson.json.JsonWriter.doWriteNull(JsonWriter.java:206)
    at org.bson.AbstractBsonWriter.writeNull(AbstractBsonWriter.java:557)
    at org.bson.codecs.BsonNullCodec.encode(BsonNullCodec.java:38)
    at org.bson.codecs.BsonNullCodec.encode(BsonNullCodec.java:28)
    at org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)
    at org.bson.codecs.BsonValueCodec.encode(BsonValueCodec.java:62)
    at com.mongodb.spark.sql.BsonValueToJson$.apply(BsonValueToJson.scala:29)
    at com.mongodb.spark.sql.MapFunctions$.bsonValueToString(MapFunctions.scala:103)
    at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$convertToDataType(MapFunctions.scala:78)
    at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:39)
    at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:37)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
    at com.mongodb.spark.sql.MapFunctions$.documentToRow(MapFunctions.scala:37)
    at com.mongodb.spark.sql.MongoRelation$$anonfun$buildScan$2.apply(MongoRelation.scala:45)
    at com.mongodb.spark.sql.MongoRelation$$anonfun$buildScan$2.apply(MongoRelation.scala:45)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
    ... 8 more

score 0 · Answer 1 · answered Jul 02 '19 at 10:13

Mongo stores data in document and schema is not fixed for all documents. So, please note this case due to meta data may be null and it's root cause for your issue. Let ignore some fields aren't available in some documents and the issue will be fixed.

score 0 · Answer 2 · edited Jan 16 '20 at 15:58

Actually, the schema miss match should be the actual issue.

Mongo DB is having the flexible schema type so even if you dont have the field and the value as null then also it will work. But you should not have schema mismatch.

For example in multiple collection, same struct/column should not have the miss and match of subtypes.

We faced this issue and after multiple checks we could finally solve it by correcting the Schema for the collections , from String to boolean

score 0 · Answer 3 · answered Jan 16 '20 at 16:25

Incase you still have this issue. Consider you have 2 documents in the mongo collecttion. This will give error since the sub elements inside structType otherDetails field (isExternalUser, isPrivate) has boolean and String. So both should be changed to String or boolean to make it work. At the same time it may or may not have some field in all collection documents(here isInternal is not present in second).

{
"_id" : ObjectId("5aa78d90d169ed325063b06d"),
"Name" : Kailash Test,
"EmpId" : 1234567,
"company" : "test.com",
"otherDetails" : {
    "isPrivate" : false,
    "isInternal" : false,
    "isExternalUser" : true
},
}
{
"_id" : ObjectId("5aa78d90d169ed123456789d"),
"Name" : Kailash Test2,
"EmpId" : 1234567,
"company" : "test.com",
"otherDetails" : {
    "isPrivate" : "false",
    "isExternalUser" : "true"
},
}

Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL

3 Answers3

Linked