2

Using the MongoDB Spark Connector I'm not able to connect to the change stream of a CosmosDB Mongo database.

I've tried to use the MongoDB Spark Connector, but if fails with:

com.mongodb.spark.sql.connector.exceptions.MongoSparkException: Could not create the change stream cursor.

This MongoDB Developer Community post mentions that colStats is not implemented in CosmosDB so I cannot use the Spark Connector.

Is there any other way to natively consume the change stream from Cosmos for MongoDB in Spark, that does not involve any intermediate step like having a feed processor?

1 Answers1

0

The MongoDB Spark Connector currently does not support change streams for Cosmos DB due to the lack of colStats command implementation in Cosmos DB.

However, there is an alternative way to consume change streams from Cosmos DB in Spark without an intermediate feed processor.

  1. You can use the Cosmos DB Change Feed feature to listen to changes and process them in real-time using Azure Functions.

  2. The output can then be sent to an Azure Event Hub and consumed by a Spark Streaming job.

This approach requires additional setup and configuration compared to using the MongoDB Spark Connector directly.

NaveenBaliga
  • 449
  • 1
  • 2
  • 5
  • Change feed is a SQL API feature - I do actually use this on old Mongo API accounts but it seems to be blocked on newly created accounts (and does require reverse engineering the way that Bson documents are represented in the SQL API) - IMO Microsoft should provide proper Azure Functions support for Mongo API triggers using change streams and "GetChangeStreamTokens" – Martin Smith Mar 21 '23 at 07:48
  • (or alternatively if they don't want to spend time writing a function extension that gives Mongo API customers parity with SQL API ones at least make accessing MongoAPI accounts via the SQL API change feed endpoint a supported scenario and document the format - as I found code required to deserialize it to BSON was not large) – Martin Smith Mar 21 '23 at 08:18