0

I have a single standalone mongo installation on a Linux machine. The database contains a collection with 181 million documents. This collection is by far the largest collection in the database (approx 90%) The size of the collection is currently 3.5 TB. I'm running Mongo version 4.0.10 (Wired Tiger)

The collection have 2 indexes.

  • One on id
  • One on 2 fields and it is used when deleting documents (see those in the snippet below).

When benchmarking bulk deletion on this collection we used the following snippet

db.getCollection('Image').deleteMany(
{$and: [
{"CameraId" : 1}, 
{"SequenceNumber" : { $lt: 153000000 }}]})

To see the state of the deletion operation I ran a simple test of deleting 1000 documents while looking at the operation using currentOp(). It shows the following.

"command" : {
                "q" : {
                    "$and" : [ 
                        {
                            "CameraId" : 1.0
                        }, 
                        {
                            "SequenceNumber" : {
                                "$lt" : 153040000.0
                            }
                        }
                    ]
                },
                "limit" : 0
            },
            "planSummary" : "IXSCAN { CameraId: 1, SequenceNumber: 1 }",
            "numYields" : 876,
            "locks" : {
                "Global" : "w",
                "Database" : "w",
                "Collection" : "w"
            },
            "waitingForLock" : false,
            "lockStats" : {
                "Global" : {
                    "acquireCount" : {
                        "r" : NumberLong(877),
                        "w" : NumberLong(877)
                    }
                },
                "Database" : {
                    "acquireCount" : {
                        "w" : NumberLong(877)
                    }
                },
                "Collection" : {
                    "acquireCount" : {
                        "w" : NumberLong(877)
                    }
                }
            }

It seems to be using the correct index but the number and type of locks worries me. As I interpret this it aquires 1 global lock for each deleted document from a single collection.

When using this approach it has taken over a week to delete 40 million documents. This cannot be expected performance.

I realise there other design exists such as bulking documents into larger chunks and store them using GridFs, but the current design is what it is and I want to make sure that what I see is expected before changing my design or restructuring the data or even considering clustering etc.

Any suggestions of how to increase performance on bulk deletions or is this expected?

SJFJ
  • 657
  • 6
  • 18
  • You try to delete about 0.8TB of data - yes this will simply take some time. What time do you expect? – Wernfried Domscheit Sep 10 '21 at 12:52
  • @WernfriedDomscheit Well, you're right it quite some data. But we're inserting in almost the same rate and I was expecting deleting being much faster. I'm not a file system guru but I would expect marking something as available would be much faster than writing data. I just thought the locks seemed a bit overkill but that might not be what causes the (IMO) slow performance. To answer your quiestion, I do not know what I was expecting, but over a week to delete almost 1TB took me by suprise – SJFJ Sep 10 '21 at 13:07
  • It is not taking a lock per document, it is taking a lock per yield ("numYields"). Try setting the slowms to 0 in order to log every operation, and then check the log to see how much data read/write there is for each delete. – Joe Sep 11 '21 at 04:35
  • @Joe The numner of yields match the number of documents. I deleted 1000 documents and ran the command just before it finished the deletion of those 1000 (it had probably reached 877) – SJFJ Sep 11 '21 at 06:03
  • 1
    I still doubt the number of locks taken is relevant. If the lock is not immediately granted, you would see `aquireWaitCount`, and the amount of time it waited in nanoseconds. Getting the mongod to log the query will give you a bit more detail about the operation. – Joe Sep 11 '21 at 07:00

0 Answers0