Is there a way to estimate cost/amount of data read by the query without actually running it?
Similar to Google's Big Query --dry_run
flag
Is there a way to estimate cost/amount of data read by the query without actually running it?
Similar to Google's Big Query --dry_run
flag
I don't believe there is such a feature at the moment. However, you can run explain()
on the query, e.g. db.airbnb.explain().find(....)
. The query plan should show you the node url
which contains the size, e.g.:
> db.airbnb.explain().find({ "address.market" : "New York", "price": {$lt: NumberDecimal("200.00")} } )
{
"ok" : 1,
"plan" : {
"kind" : "multiPlanNode",
"regionPlans" : {
"2/ap-southeast-2" : {
....
"node" : {
"kind" : "data",
"partitions" : [
{
"url" : "s3://xxxx/json/airbnb/listingsAndReviews.json?agentRegion=2%2Fap-southeast-2&format=.json®ion=ap-southeast-2&size=92.65681457519531+MiB",
"attributes" : {
}
}
....
Note the section:
"url" : "s3://xxxx/json/airbnb/listingsAndReviews.json?agentRegion=2%2Fap-southeast-2&format=.json®ion=ap-southeast-2&size=92.65681457519531+MiB"
means that the query will read that S3 URL, which is 92 MB in size.
Edit: As pointed to by @willis, running explain()
without any parameter will not actually run the query, but will only display the execution plan (see explain() behavior). However, with explain('executionStats')
, the query will actually be executed.