0

I have stuck somewhere in MongoDB aggregate query. I tried to generate a summary report from the database which contains 110M records. during the report generation, I faced the following issues 1).Even though the collection is indexed they are not utilized for the search. 2).Once query execution finished memory of DB server not decreased. 3)query take considerable time to return the result.

im useing mongodb Atlas v4.2.8 sample document

{
    "_id": {
        "$oid": "5eb122f714d0510011e3a184"
    },
    "from": "Star_friends",
    "to": "94713414047",
    "accountName": "ZM",
    "accountId": "ZM",
    "campaignName": "test 1",
    "campaignId": "5eb122f1e921c3001922f73c",
    "campaignType": "BULK",
    "status": {
        "$numberInt": "3"
    },
    "reason": "No Routing",
    "channel": "sms",
    "messageType": {
        "$numberInt": "1"
    },
    "event": "MT",
    "content": "test 132",
    "credit": {
        "$numberInt": "1"
    },
    "msgId": "",
    "createdDateTime": "2020-05-05T13:55:27.743Z",
    "updatedTime": "2020-05-05T13:55:27.745Z",
    "uDate": "2020-05-05",
    "operator": "mobitel"
}

my query as follows

db.getCollection('report').aggregate([{
    "$match": {
        "createdDateTime": {
            "$gt": "2020-09-14T00:00:01.000Z",
            "$lt": "2020-09-15T23:59:99.999Z"
        },
        "messageType": {
            "$in": [1, 2]
        },
        "channel": {
            "$in": ["sms", "viber", "whatsapp"]
        },
        "accountId": {
            "$in": ["ZM", "KEELLS"]
        }
    }
}, {
    "$project": {
        "_id": 0,
        "channel": 1,
        "messageType": 1,
        "accountName": 1,
        "accountId": 1,
        "createdDateTime": 1,
        "uDate": 1,
        "credit": 1,
        "status": 1
    }
}, {
    "$group": {
        "_id": {
            "channel": "$channel",
            "messageType": "$messageType",
            "accountName": "$accountName",
            "accountId": "$accountId",
            "filteredDate": {
                "$substr": ["$createdDateTime", 0, 7]
            },
            "sortDate": "$uDate"
        },
        "total": {
            "$sum": "$credit"
        },
        "send": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [2, 15, 1, 14, 6, 17, 4, 5]]
                }, "$credit", 0]
            }
        },
        "delivered": {
            "$sum": {
                "$cond": [{
                        "$in": ["$status", [6, 17, 4]]
                    },
                    "$credit",
                    0
                ]
            }
        },
        "deliveryFailed": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [12, 5]]
                }, "$credit", 0]
            }
        },
        "failed": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [3]]
                }, "$credit", 0]
            }
        },
        "datass": {
            "$addToSet": {
                "channel": "$channel",
                "messageType": "$messageType",
                "accountName": "$accountName",
                "accountId": "$accountId",
                "filteredDate": {
                    "$substr": ["$createdDateTime", 0, 7]
                },
                "sortDate": "$uDate"
            }
        }
    }
}, {
    "$unwind": "$datass"
}, {
    "$project": {
        "_id": 0
    }
}, {
    "$sort": {
        "datass.sortDate": -1
    }
}])

indexes as follows

accountId_1 / accountId_1_createdDateTime_-1 / campaignId_-1 / channel_1 / createdDateTime_-1 / messageType_1 / msgId_-1 / msgId_-1_status_1

I would be appreciated if someone can help me with this

Thanks

Praveen
  • 9
  • 3

2 Answers2

0

You gave us little information. How many documents should average query like such return? How long does it take to execute the said query?

What I can see here is that your match pipeline is good, because you are trying to filter out documents by fields that are indexed. But what is a "performance smell" here is your $sort function which does sorting on non-indexed field. Try to do sorting immediately after $match.

Play with it a little more and try to figure out which stage of the pipeline is a performance bottle-neck.

Tomov Nenad
  • 154
  • 5
  • above query return 15 records which is generated using 2M documents(return only grouped summarized data set ) this takes nearly 4 minuets. i guess bottle-neck is indexing. when i explain a query it Examined total number of documents i the collection – Praveen Nov 17 '20 at 09:59
  • i have tested with lesser number of documents and query explain says "No index available for this query." even though i have index "createdDateTime" field. no idea why indexes are not utilizing during the execution. this is a sample explain that i have generated with 71 records { "stage": "COLLSCAN", "nReturned": 71, "executionTimeMillisEstimate": 0, "works": 73, "advanced": 71, "needTime": 1, "needYield": 0, "saveState": 0, "restoreState": 0, "isEOF": 1, "direction": "forward", "docsExamined": 71 } – Praveen Nov 17 '20 at 10:12
0

I have resolved my issue by changing my indexes

accountId_1_createdDateTime_-1 / msgId_-1_status_1 / accountId_1_messageType_1_channel_1_createdDateTime_1_accountName_1_uDate_1_credit_1_status_1

Praveen
  • 9
  • 3