1

I have a collection with 250K documents and I want to find the last one inserted with a particular property "siteid" which is indexed. This is the query I am using.

db.getCollection('txnCollection').find({"siteid":"123456"}).limit(1).sort({"_id":-1})

If there is a record it is found quickly. If there isn't the performance is not quick enough circa 0.8s.

However if I run the query

db.getCollection('txnCollection').find({"siteid":"123456"}).limit(1)

The performance is really quick even if there is no document with siteid 123456.

It seems silly to run this query then if a record is returned to run the second query to get the record I want - ie. the last inserted one, but I cannot work out a way of doing anything better

For info when I run explain the result I get is

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "txn.txnCollection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "siteid" : {
                "$eq" : "123456"
            }
        },
        "winningPlan" : {
            "stage" : "SORT",
            "sortPattern" : {
                "_id" : -1
            },
            "limitAmount" : 1,
            "inputStage" : {
                "stage" : "KEEP_MUTATIONS",
                "inputStage" : {
                    "stage" : "FETCH",
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "keyPattern" : {
                            "siteid" : 1
                        },
                        "indexName" : "siteid_1",
                        "isMultiKey" : false,
                        "direction" : "forward",
                        "indexBounds" : {
                            "siteid" : [ 
                                "[\"123456\", \"123456\"]"
                            ]
                        }
                    }
                }
            }
        },
        "rejectedPlans" : []
    },
    "serverInfo" : {
        "host" : "XXXXX",
        "port" : 27017,
        "version" : "3.0.6",
        "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2"
    },
    "ok" : 1
}
profesor79
  • 9,213
  • 3
  • 31
  • 52
user3782299
  • 333
  • 3
  • 9

1 Answers1

0

The point is when sort and limit is used, sort is applied before limit is executed..... That means your full query result is sorted and then limited.

profesor79
  • 9,213
  • 3
  • 31
  • 52
  • If I take the limit off. Nothing is returned quickly when there are no records with the site id. If I then sort it. It is really slow. It still returns nothing. – user3782299 Jun 09 '16 at 14:21
  • Is there a way of turning this off if it is expensive. I am worried about performance not really whether a new record has been added since my query started. – user3782299 Jun 09 '16 at 14:35
  • This has go weirder. I have added another unrelated index and this has started working now I am getting very quick results even when I am querying a siteid not in the table, but the new index is behaving exactly the same ie. returns quickly if the field exisits or if I leave out the sort, but not if I put the sort in. When I use explain I get exactly the same result for both the query that is fast and the one that is now slow. is there something I have to do to indexes to get them to work properly? – user3782299 Jun 13 '16 at 07:18
  • @user3782299 that a bit complicated - I am with you now :-)... could use `hint` like `query.hint({siteid:1})` ? – profesor79 Jun 13 '16 at 07:52
  • I don;t think that is the problem. Look at the explain above. The index is being used. I have a fresh update. the query without the sort always runs fast (2ms) even when the search is not successful. The one with the sort sometimes does, but sometimes is very slow. I think it may have to do with transactions being inserted whilst the query is executing. – user3782299 Jun 14 '16 at 10:41
  • I don't care about these. My database is a very simple transaction store. Transactions are being added constantly, but not updated. I want to see the most recent transaction for a partiular siteid (but do not care if I miss by a few seconds). I need reads to execute <10ms and the store will get big >750M records (it is only 0.25M) now – user3782299 Jun 14 '16 at 10:48
  • So the question is how do I get the last transaction for a siteid quickly. Is my index correct. – user3782299 Jun 14 '16 at 10:56