1

If I have a DynamoDB table with pk and sk where pk is such that I can query the table for a given pk and get all items in a given category, how does this differ from scanning a sparse secondary index that contains only items from said category? I know GSI read/write units are separate from the main table, but I'm wondering if there is a latency or other benefit to be had from doing one over the other.

Wuubb
  • 71
  • 6
  • 1
    When you say "a sparse secondary index", do you mean a GSI? If so, you can't restrict the content of the GSI to just those items with said category, if I understand what you're trying to do here. – jarmod Aug 04 '20 at 19:45
  • @jarmod What I mean is if you have a secondary index with `sk` that is an attribute in only one category of items, therefore only those items are in the index. I'm curious if there's any performance differences between scanning such an index vs querying the main table, where in this scenario both would return you all items in said category. – Wuubb Aug 06 '20 at 02:35

1 Answers1

1

AFAIK, in theory, there shouldn't be any performance difference between them. First of all, the primary table and GSI both use the same underlying storage nodes, so the IO performance should be the same. Secondly, no matter you query the primary table or scan the sparse GSI, the partition key of the records you are retrieving is the same, which means all those records reside in the same partition (not split in shards).

Some benefits I can think of to do queries in the primary table:

  1. Save RCU, WCU and storage cost of the GSI
  2. You have the ability to do consistent reads
jellycsc
  • 10,904
  • 2
  • 15
  • 32