Weird result when querying a dynamodb GSI with a non-existent start key. Bug or feature?

Question

I'm querying a global secondary index with a start key that does not exist and I'm seeing some weird results. Is this a ddb bug or (un?)documented behavior? Are there workarounds?

I have a table with the primary hashKey being "id" and the ShopIndex GSI being "shop". Both with no rangeKeys.

When I query using a start key "id" that does not exist, I'd expect to get back an empty response with a correct last evaluated key since there are no results to return after an invalid start key.

However, what I'm seeing is a seemingly random result(s) are returned.

Code example:

This snippet returns one item from the index. Not the first result. Not the last.

const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB();
dynamodb.query({
    TableName: 'products',
    IndexName: 'ShopIndex',
    Limit: 1,
    ExpressionAttributeValues: {
        ':shop': { S: 'shop_KgHqp62taEV' }
    },
    KeyConditionExpression: 'shop = :shop',
    ExclusiveStartKey: {
        id: { S: 'doesnotexist' },
        shop: { S: 'shop_KgHqp62taEV' },
    },
}, (err, result) => {
    if (err) throw err;
    console.log('result', JSON.stringify(result, null, 2));
});

If I remove the start key entirely, it returns a different item.

If I add it back and set ScanIndexForward: false, it returns a third different item.

If I remove the start key AND set ScanIndexForward: false, it returns a fourth different item.

Wtf.

As far as I can tell, there is no way to detect this other than look up the "id" and confirm it exists before attempting to use it as the start key?

Did I miss this in the docs, or is this yet another batteries not included aws landmine that I need to work around?

Is it possible that `KeyConditionExpression: 'shop = :shop'` doesn't do what you intend, and instead is always true? — Michael - sqlbot, Jun 21 '18 at 22:10
i don't think so because other operations work and there is nothing wrong with the data it returns. pretty sure it's just an undocumented bug. FWIW i switched to doing a read before all queries to confirm the primary index exists before attempting the query. annoying, but works. — Cory Mawhorter, Jun 23 '18 at 00:05
But you are testing with `Limit: 1,` which means you are potentially masking incorrect behavior that is *always* present and you are in essence getting the right result by accident. What happens with a larger limit when the record exists? If you get back both right and wrong responses, that should confirm this. — Michael - sqlbot, Jun 23 '18 at 00:32
larger limits behave the same -- just >1 results relative to that that first random object. this definitely seems like a bug considering the eventually consistent nature of ddb. if i create -> query using that result as the start key and it's not consistent yet, i'll get random results. that's a bug. it should probably fail for non-existent start keys the same way trying to read a non-existent id fails. — Cory Mawhorter, Jul 06 '18 at 23:05
If you don't use Limit 1, and the record does exist, what happens then? If you get both right and wrong answers, that is different than if you only get the right answer. — Michael - sqlbot, Jul 07 '18 at 00:09
Oh... wait... re-reading the question, I assumed something you haven't actually stated. Do those "random" records have the same `shop` value? — Michael - sqlbot, Jul 07 '18 at 00:30
@Michael-sqlbot yes, they are all the same. with/without limit displays the same weird behavior. the implementation works well enough at all other times. verifying startkey before using it has also seemed to be a pretty good workaround (at least on the small scale). — Cory Mawhorter, Sep 14 '18 at 22:23
In that case, the problem is with your expectations. The value of your start key has a *place where it belongs* in the index, even if the value is not actually present in the index. Just as the surname Jozxyqk has a *place* in the telephone directory, even if nobody in the city has that name. You are scanning from that place, which is why changing the ordering changes the results. This behavior seems entirely correct. — Michael - sqlbot, Sep 14 '18 at 23:14
that'd be true if Jozxyqk existed in the book. but what i'm talking about is that it **doesn't** and instead, a random page from the book is returned. definitely not correct behavior for an inconsistent db. — Cory Mawhorter, Sep 16 '18 at 16:29
That's exactly the nature of the error in your assumption. Every record does in fact belong somewhere, *even when that record doesn't currently exist*, even if it never has. It still has a place, and that place is where you are asking DynamoDB to start scanning -- the place where that record would be found if present. Whether or not that record actually exists right now is not relevant. The place may be a gap that is only 0 records wide, but it can still be said to exist all the same. Scanning forward or backward from that spot gives the result you observe. — Michael - sqlbot, Sep 16 '18 at 17:27

score 1 · Answer 1 · answered Jun 24 '18 at 23:21

It is a feature!

In your table you have more ids for the same shop.

Just imagine the following example:

id shop 41 A 22 A 93 A 34 A

The items are in memory in that order: 41, 22, 93, 34.

When you ask for one item without any ExclusiveStartKey, you will get 41 (the first scanned).

When you say that the starting key (i.e. last evaluated key) is 93, you will get the next one: 34.

When you say that the starting key is 93, but ScanIndexForward: false, you will look backward and you'll get 22.

In order to better understand, run the queries without Limit: 1. You should notice the difference in the results.

So, it is definitely a feature! A very important one, because with these features and range key you can do wonderful queries. I did! ;)

this is a great explanation of start keys, but isn't exactly what my post is about. my post is about using a start key for an id that *does not* exist in the data. using your example table, using a start key of 100 would return a seemingly random result (instead of 41 or 34) — Cory Mawhorter, Jul 06 '18 at 22:49
@Costin, the question is this: Why doesn't `ExpressionAttributeValues: { ':shop': { S: 'shop_KgHqp62taEV' } },` and `KeyConditionExpression: 'shop = :shop',` result in nothing being returned? — Michael - sqlbot, Jul 07 '18 at 00:14

Weird result when querying a dynamodb GSI with a non-existent start key. Bug or feature?

1 Answers1

It is a feature!