1

Having an issue with the following configuration,

Driver version : 3.12.1, mongodb-driver for Java

Server Version: 3.2 of Mongo API for Azure Cosmos DB (Ancient, I know)

We run some fairly high read/write loads and may hit rate limiting from the Cosmos API for Mongo. In this case, I expect an exception to occur. We're doing pretty vanilla queries, code snippet looks similar to

public DatabaseQueryResult find(String collectionName, Map<String, Object> queryData) {

    Document toFind = new Document(queryData);
    MongoCollection<Document> collection = this.mongoDatabase.getCollection(collectionName);

    FindIterable<Document> findResults = collection.find(toFind);

    if (findResults != null) {
        Document dataFound = findResults.first();
        return new DatabaseQueryResult(dataFound.toJson(this.settings))     
    }

    // other stuff...
}

When rate limited by Azure, you'll receive a response like so

{
   "$err":"Message: {\"Errors\":[\"Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: http://aka.ms/cosmosdb-error-429\"]}\r\n s",
   "code":16500,
   "_t":"OKMongoResponse",
   "errmsg":"Message: {\"Errors\":[\"Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: http://aka.ms/cosmosdb-error-429\"]}\r\n",
   "ok":0
}

I expect an exception to be thrown here - but that doesn't seem to be the case with the later driver. What's happening is,

  • collection.find is returning a FindIterable with the JSON error result as above as the first document
  • We're eventually returning a DatabaseQueryResult with JSON error as the query payload

I don't want this to happen - I'd much prefer the mongo driver to throw a MongoCommandException/MongoQueryException if a query operation returns an OKMongoResponse where "ok" 0. This seems fine on writes, which will use a CommandProtocol object and the response is validated as I'd expect - it's just reads that seems to have changed.

Comparing the 2 driver versions, this seems to be a change in read behaviour - perhaps due to retryable reads that were introduced in version 3.11? Response validation now seems to be around this section.

Q: Is there a way to configure my Mongo client so that the driver will validate server responses on read operations and throw an exception if it receives a OKMongoResponse, and ok == 0?

I can of course validate the results myself, but I'd prefer not to and let the driver do this if possible

Dylan Morley
  • 1,656
  • 12
  • 21

2 Answers2

1

I'm not sure why Mongo changed this driver. There is something on the Cosmos side which may help. You can raise a support ticket and ask them to turn on server-side retries. This will change the behavior of Cosmos such that requests will queue up rather than throw 429's when there are too many.

This more reflects how Mongo behaves when running on a VM or in Atlas (which also runs on VM's) rather than a multi-tenant service like Cosmos DB.

Mark Brown
  • 8,113
  • 2
  • 17
  • 21
  • OK thanks @Mark Brown, that might be something for us to look into. Ideally, we'll upgrade to 3.6 of the Mongo API where this behaviour is standard (I believe), but can't say for sure when we'll do that. – Dylan Morley Sep 28 '20 at 16:45
  • Hi Mark. Is turning on server side retries something different from the behaviour described under the **RequestRateIsLarge errors have been removed** header here? https://devblogs.microsoft.com/cosmosdb/upgrade-your-server-version-from-3-2-to-3-6-for-azure-cosmos-db-api-for-mongodb/ – Martin Smith Nov 20 '20 at 08:24
  • 1
    That blog post is incorrect. You still need to open a support ticket to get server-side retries turned on. I'll get that post updated. Thanks. – Mark Brown Nov 20 '20 at 18:53
0

With 3.2-3.4 servers the drivers use find command described here, not OP_QUERY.

The driver surely is not "returning OKMongoResponse" since it isn't written for cosmosdb.

If you think there is a driver issue, update the question with exact wire protocol response received and the exact result you receive from the driver.

Retryable writes require sessions (which cosmosdb advertises but does not support, see Importing BSON to CosmosDB MongoDB API using mongorestore) and normally use the OP_MSG protocol which come with 3.6+ servers. I don't know what drivers would do if a 3.2 server advertises session support, this isn't a combination that is possible with MongoDB.

Note that MongoDB does not support cosmosdb (and consequently MongoDB drivers don't, officially, either).

D. SM
  • 13,584
  • 3
  • 12
  • 21
  • The response we're getting back from the Azure Mongo API for Cosmos when rate limited is as per the example in the question. I've redacted a bit of data due in the payload (i.e. correlation ids, instance names etc) but that's the payload schema exactly. – Dylan Morley Sep 29 '20 at 07:53
  • Looking at this post, https://devblogs.microsoft.com/cosmosdb/upgrade-your-server-version-from-3-2-to-3-6-for-azure-cosmos-db-api-for-mongodb/, RequestRateIsLarge errors are removed and will auto-retry until request timeout. Perhaps the best thing to do is just plan the upgrade ASAP – Dylan Morley Sep 29 '20 at 07:59
  • The server response looks reasonable to me but your description of what the driver does isn't what I would have expected. – D. SM Sep 29 '20 at 08:01