0

I was modeling a DynamoDB table and noticed that one solution I was considering could lead to a particular scenario of having duplicate keys on the GSI. The first tests I ran to determine the order proved inconclusive, so I moved to a different approach.

However, I am still curious if there is any hidden logic to tiebreaker the keys on a Global Secondary Index (GSI).

Here is an example scenario with the following structure and items:

pk sk gsi_sk
A a a
A b a
A c a

Suppose I am querying with the GSI partition key pk and sort key gsi_sk, but I only provide the pk value of A. How would the order of the results be decided?

I would appreciate any insights or explanations on this topic. Thank you!

dege
  • 2,824
  • 2
  • 25
  • 33
  • How would you want DynamoDB to sort them? Is there another attribute (or collection of attributes) that should be the tiebreaker? – jarmod May 02 '23 at 17:29
  • @jarmod do not have a specific want, just wanted to have more clarity on the tie break. But if had one would be the creation date. – dege Aug 02 '23 at 10:38

2 Answers2

2

As you state, GSI's can have duplicate items, and that's the reason you can't do a GetItem on an index.

As for ordering, while it may seem that ordering is inconclusive, it is deterministic. The minimum projection type for an index is KEY_ONLY, so no matter what attributes you project you will always have the base tables keys. These values are what keep the items in the index unique, and are also the values which define the order. It may be difficult to understand the order based on the base tables keys, however as I mentioned it is deterministic.

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31
  • Thanks for the reply, really appreciate it. So there's something that helps determine the order in a deterministic way but it's internal to aws? – dege May 02 '23 at 19:05
  • I wouldn't go as far as to say it's internal, using the base tables keys you should be able to determine the order. And the order will always remain the same. – Leeroy Hannigan May 02 '23 at 19:25
1

I can further confirm the answer provided by Lee Hannigan by doing some experimentation to try to prove it to myself, because when there's no source for the answer, I worry it might not necessarily be correct. There is a specific, seemingly consistent order, that has nothing to do with when the items were inserted.

Consider the following table:
Columns: pk, sk, pk2
Primary key: pk, sk
GSI: pk2, sk

If you have the following records:
pk: 2 sk: 1 pk2: 2
pk: 3 sk: 1 pk2: 2
pk: 4 sk: 1 pk2: 2
pk: 5 sk: 1 pk2: 2
pk: 6 sk: 1 pk2: 2

All records are the same except their pk values go from 2 to 6.

Their ascending order in the GSI is:
pk: 2 sk: 1 pk2: 2
pk: 6 sk: 1 pk2: 2
pk: 5 sk: 1 pk2: 2
pk: 4 sk: 1 pk2: 2
pk: 3 sk: 1 pk2: 2

This is weird, and shows that the sort isn't based purely on a standard numerical/string comparison of the pk values. But still, by inserting these records multiple times in different orders, we can see that the resulting order is the same regardless, so it clearly is being determined by something consistently. Who knows, maybe it's related to the hashcode of the pk?

Regardless, this is important because it means that next-tokens can be relied upon in GSIs, even if multiple records have the same partition and sort keys, and even if the record referenced by the LastEvaluatedKey is deleted, using it in the LastEvaluatedKey still works, which is what sent me down this rabbit hole to begin with!

Roy
  • 31
  • 3