24

Recently I notice a huge performance difference between doing multiple upserts (via bulk operations) vs an insert (multiple documents). I would like to know if I am correctly on this:

  • Upsert/Updates will be like a find() and update() so it does 2 things read and write
  • Insert will just write so its a lot faster

Thus the performance difference?

If this is the case, I wonder if I need a lot of writes regularly, instead of updating a document, I write a new document with a createdOn field. Then to query, I will just query for documents, sorted by createdOn DESC. I wonder if this is a good method? Or is there a better way?

  • I do wonder if I have index on the collection, might it speed up the update? But wont this index slow down the write portion then?
  • With the 2nd way, where I only do inserts, will it slow down then I have too many documents? Is it practical (to speed up the writes)?
  • I have also tried increasing the connection pool size. Not sure whats the optimum, but I tried 20 and I see I can handle abt 20 queries per sec thru mongostat. I expected it to be alot higher.
Jiew Meng
  • 84,767
  • 185
  • 495
  • 805
  • In general, indexes are only for speeding up reads. Not writes. – m8a Jan 31 '16 at 12:48
  • => Creating different documents each time : It will be good solution depending on how frequently you add the document. If document count will be huge in very less time, your find query will be slower. I won't use this bcos I have order documents in each and every query ... Even I have to get a single document. – Tushar Niras Dec 17 '16 at 16:32

2 Answers2

36

If you are inserting document, MongoDB needs to check whether the document with the same ObjectId exists or not. If it exists document cannot be inserted.

Same case applies to Update. It needs to check whether the document exists or not. Otherwise, update cannot be performed. The case where your update query will slow down is when it can't find document based on your ObjectId / indexed field.

Otherwise, performance for inserting / updating document should be the same.

So Insert can be like this //(Fast)

  1. (Check for document -> Not Found -> Insert new document) Else
  2. (Check for document -> Found -> Cannot be inserted)

And Update with upsert (ObjectId available) //(Fast)

  1. (Check for document -> Not Found -> Insert new document) Else
  2. (Check for document -> Found -> Update the document)

Or Update with upsert (Without ObjectId) //This is slow

  1. (Find ObjectIds (Slow) -> Not Found -> Insert new document) Else
  2. (Find ObjectIds (Slow)-> Found -> Update the documents)
X X
  • 81
  • 7
Code OverFlow
  • 913
  • 1
  • 13
  • 28
2

I haven't found an 'official' explanation on how an upsert works in MongoDB, but yes it is safe to assume that, since the operation is aimed at updating existing documents and only add a document when the document with the given criteria cannot be found.

If you add an index, then the upsert can become faster: after all the index is used to 'find' the document. The caveat is in the field(s) the index operates on and the fields that you're updating. If the updated portion is part of the index, you will have a performance impact on updating the document. If the updated portion is not part of the index, you will not incur a penalty for writing in the existing document. If the document is added though, you will have a minor performance impact, since the index collection is update. But still: just adding a document will remain faster.

Therefore, if in your scenario you know that you don't want to update documents, then inserts are generally faster. If you want to make sure that you do not add the same document twice, you can also opt for adding a unique index. Then an insert will simply fail.

All in all it depends on the specific scenario, but based on the information I can extract from your question I think the best option is to simply insert the documents. Since you seem to make sure that the 'createdon' field makes the documents unique in your scenario you only have to worry about indexes that are used in your read-scenarios.

Some extra info can be found on the MongoDB site:

For more information on designing your (read) indexes, a pretty good explanation on finding out whether your indexes add anything to the query plans can be found here:

I hope this helps.

Naing Lin Aung
  • 3,373
  • 4
  • 31
  • 48