0

I will need your help with understanding an performance problem.

We have a system where we are storing set of documents (1k-4k docs) in batches. Documents have this structure: {_id: ObjectId(), RepositoryId: UUID(), data...} where repository id is same for all instance in the set. We also set an unique indexes for: {_id: 1, RepositoryId: 1}, {RepositoryId: 1, ...}.

In the usecase is: delete all documents with same RepositoryId:

db.collection.deleteMany(
    { RepositoryId: UUID("SomeGUID") },
    { writeConcern: {w: "majority", j: true} }
)

And then re-upsert batches (300 items per batch) with same RepositoryId as we delete before:

db.collection.insertMany(
   [ { RepositoryId: UUID(), data... }, ... ],
   {
      writeConcern: {w: 1, j: false},
      ordered: false
   }
)

The issue is that upsert of first few (3-5) batches take much more time then reset (first batch: 10s, 8th bach 0.1s). There is also entry in log file:

{
    "t": {
        "$date": "2023-01-19T15:49:02.258+01:00"
    },
    "s": "I",
    "c": "COMMAND",
    "id": 51803,
    "ctx": "conn64",
    "msg": "Slow query",
    "attr": {
        "type": "command",
        "ns": "####.$cmd",
        "command": {
            "update": "########",
            "ordered": false,
            "writeConcern": {
                "w": 1,
                "fsync": false,
                "j": false
            },
            "txnNumber": 16,
            "$db": "#####",
            "lsid": {
                "id": {
                    "$uuid": "6ffb319a-6003-4221-9925-710e9e2aa315"
                }
            },
            "$clusterTime": {
                "clusterTime": {
                    "$timestamp": {
                        "t": 1674139729,
                        "i": 5
                    }
                },
                "numYields": 0,
                "reslen": 11550,
                "locks": {
                    "ParallelBatchWriterMode": {
                        "acquireCount": {
                            "r": 600
                        }
                    },
                    "ReplicationStateTransition": {
                        "acquireCount": {
                            "w": 601
                        }
                    },
                    "Global": {
                        "acquireCount": {
                            "w": 600
                        }
                    },
                    "Database": {
                        "acquireCount": {
                            "w": 600
                        }
                    },
                    "Collection": {
                        "acquireCount": {
                            "w": 600
                        }
                    },
                    "Mutex": {
                        "acquireCount": {
                            "r": 600
                        }
                    }
                },
                "flowControl": {
                    "acquireCount": 300,
                    "timeAcquiringMicros": 379
                },
                "readConcern": {
                    "level": "local",
                    "provenance": "implicitDefault"
                },
                "writeConcern": {
                    "w": 1,
                    "j": false,
                    "wtimeout": 0,
                    "provenance": "clientSupplied"
                },
                "storage": {
                },
                "remote": "127.0.0.1:52800",
                "protocol": "op_msg",
                "durationMillis": 13043
            }
        }
    }
}

Is there some background process that is running after delete that affects upsert pefrormance of first batches? It was not a problem until we switched from standalone to single instance replica set, due to transaction support in another part of app. This case does not require transaction but we can not host two instances of mongo with different setup. The DB is exclusive for this operation, no other operation runs on DB (running in isolated test environment). How we can fix it?

The issue is reproducible, seems when there is time gap in test run (few minutes), the problem is not there for first run but then following runs are problematic. Runing on machine with Ryzen 7 PRO 4750U, 32 GB Ram and Samsung 970 EVO M2 SSD. MongoDB version 5.0.5

  • Check the explain plan, I guess the index `{_id: 1, RepositoryId: 1}` is not used. Try an index on `{ RepositoryId: 1 }` – Wernfried Domscheit Jan 25 '23 at 14:25
  • I checked the remove command in mongod.log, and it use the index for the process. Its compound index but RepositoryId is an prefix. (I edit the original post to make it clear) – Matěj Mainuš Jan 26 '23 at 12:56

1 Answers1

0

In that log entry timeAcquiringMicros indicates that this operation waited while attempt to acquire a lock.

flowControl is a throttling mechanism that delays writes on the primary node when the secondary nodes are lagging, with the intent of letting them catch up before the get so far behind that consistency is lost.

Waiting on the flowControl lock would suggest that there was a backlog of operations that were still be replicated to the secondaries, and they were a bit behind, so the new writes were being slowed.

See Replication Lag and Flow Control for more detail

Joe
  • 25,000
  • 3
  • 22
  • 44