Estimating query impact in MongoDB Atlas Data Lake

Question

Is there a way to estimate cost/amount of data read by the query without actually running it?

Similar to Google's Big Query --dry_run flag

kevinadi · Answer 1 · 2019-07-01T00:21:19.150

I don't believe there is such a feature at the moment. However, you can run explain() on the query, e.g. db.airbnb.explain().find(....). The query plan should show you the node url which contains the size, e.g.:

> db.airbnb.explain().find({ "address.market" : "New York", "price": {$lt: NumberDecimal("200.00")} } )
{
  "ok" : 1,
  "plan" : {
    "kind" : "multiPlanNode",
    "regionPlans" : {
      "2/ap-southeast-2" : {
....
        "node" : {
          "kind" : "data",
          "partitions" : [
            {
              "url" : "s3://xxxx/json/airbnb/listingsAndReviews.json?agentRegion=2%2Fap-southeast-2&format=.json&region=ap-southeast-2&size=92.65681457519531+MiB",
              "attributes" : {

              }
            }
....

Note the section:

"url" : "s3://xxxx/json/airbnb/listingsAndReviews.json?agentRegion=2%2Fap-southeast-2&format=.json&region=ap-southeast-2&size=92.65681457519531+MiB"

means that the query will read that S3 URL, which is 92 MB in size.

Edit: As pointed to by @willis, running explain() without any parameter will not actually run the query, but will only display the execution plan (see explain() behavior). However, with explain('executionStats'), the query will actually be executed.

Thanks! Just to clarify - `explain().find()` would not actually *run* the query. It would just show what the plan. Correct? — Anton, Jun 28 '19 at 13:45
That's correct. It will only run the query if you pass something like `{executionStatus: true}` otherwise it will just figure out the query plan https://docs.mongodb.com/manual/reference/method/cursor.explain/ — klhr, Jun 30 '19 at 10:01

Estimating query impact in MongoDB Atlas Data Lake

1 Answers1