The query plan was showing a collection scan which is going over all the rows in the mongo collection. Hence I created an index on the where clause columns, expecting Drill to choose the index based access plan. But drill continues to use the full table scan. Is there any thing else to be done to make drill use the index ?
The actual query, generated query plan and the mongo index are given below.
SQL:
Select j.user as User, TO_DATE(j.created_at) as submitted_on
from mongo.example.jobs j
where j.user = 'john@example.ai' and j.created_at BETWEEN timestamp '2020-03-25 13:12:55' AND timestamp '2020-04-24 13:12:55'
Physical Plan (from drill UI)
00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6720.59875 rows, 23532.19875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10468
00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6708.475 rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10467
00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6587.2375 rows, 22913.8875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10466
00-03 Filter(condition=[AND(=($0, 'john@example.ai'), >=($1, 2020-03-25 13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6466.0 rows, 22792.65 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs, filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 3233.0, cumulative cost = {3233.0 rows, 6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464
Index created in MongoDB
{
"v" : 2,
"key" : { "user" : 1, "created_at" : 1, "method_map_id" : 1 },
"name" : "user_1_created_at_1_method_map_id_1",
"ns" : "example.jobs"
}
Moreover, in a drill documentation, I see that drill supports indexes only for MapR DB. Does that mean Indexes of other data sources like mongo won't be used ?
https://drill.apache.org/docs/querying-indexes-introduction/