0

I recently just started learning about DynamoDB single table design. Now, I am trying to model Shipment Update data that has the following properties:

  • an account has multiple users
  • an account has multiple shipments
  • a shipment can change eta multiple times
  • each time there's a shipment update, a new record will be inserted

Access patterns:

  1. get all shipments of an account displaying the last updated status ordered by eta in an ascending order
  2. for a given shipment, get the chronological updates

enter image description here

enter image description here

I am having a difficulty trying to resolve the 2 access patterns mentioned above. If, per se, I only have 1 record per shipment, then I can just update the sort key for the shipment update items to be shpm#55abc and the retrieval of all shipments for a given account by eta is straight forward, which is via the gsi accountEta.

How do I resolve this to get the access patterns I need? Should I consider having a separate table for the shipment update audit, i.e. to store just the shipment updates? So that when I need access pattern #2, then I query this audit table by the shipment id to get all the chronological updates. But, I feel like this defeats the purpose of the single table design.

billydh
  • 975
  • 11
  • 27

1 Answers1

1

A single-table design is a good fit for these access patterns. Use overloadable, generic key names like PK and SK. Here is one approach*:

Shipments have a "current" record. Add a global secondary index (GSI1) to create an alternate Primary Key for querying by account in ETA order (pattern #1). All changes to the shipment are executed as updates to this "current" record.

# shipment "current" record
PK             SK                                 GSI1PK            GSI1SK
shpmt#55abc    x_current                          account#123       x_eta#2022-07-01

Next, enable DynamoDB Streams on the table to capture shipment changes. Each time a "current" record is updated, the Lambda backing the Stream writes the OLD_IMAGE to the table as a change control record. This enables pattern #2 by shipment and account.

# shipment update record
PK             SK                                 GSI1PK           GSI1SK
shpmt#55abc    update#2022-06-28T06:10:33.247Z    account#123      update#2022-06-28T06:10:33.247Z

One virtue of this approach is that a single query operation can retrieve both the current shipment record and its full/partial change history in reverse order. This is the reason for the x_ prefixes on the current record's keys. A query with a key expression of PK = shpmt#55abc AND SK >= "update", DESC sorting with ScanIndexForward=False and a limit of 2 returns the current record (x_current) and the latest update record.

* Whether this is a good solution for you also depends on expected read/write volumes.

fedonev
  • 20,327
  • 2
  • 25
  • 34
  • hey thanks @fedonev. i am still digesting this, so bear with me before accepting your answer. on your code blocks `# shipment "current" record`, the gsi has the same name `GSI1SK`. was it meant to be `GSI1PK` and `GSI1SK`? And on your `GSI1SK`, 1 is prefixed with `x_eta` and the other `update`. what is the reason for that? I need to read up on the DynamoDB Streams, haven't explored that previously. – billydh Jun 28 '22 at 09:26
  • 1
    @billydh Index values: yes, deliberate. It's the [GSI overloading](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html) pattern - one index holds several record types. Index attribute name: right, typo is now fixed! – fedonev Jun 28 '22 at 10:08
  • 1
    N.B. [Alex Debrie's book](https://www.dynamodbbook.com) is a great way to learn DynamoDB's idioms. – fedonev Jun 28 '22 at 10:09
  • On my question section, I didn't have the `accountId` column, which, in your solution, it looks like it is used as the `GSI1PK`, so I added a new attribute, `accountId`, where the value can just be `123`. But, I am a bit confused regarding the `GSI1SK`. which attribute does it take its value from? For the current record, the value is `x_eta#2022-07-01`, where the `SK` is `x_current`, but for the update record `update#2022-06-28T06:10:33.247Z`, which is the same as the `SK`. – billydh Jun 28 '22 at 11:03
  • I just realised, are `GSI1PK` and `GSI1SK` attributes? – billydh Jun 28 '22 at 11:20
  • 1
    @billydh Yes, they are attribute names. My answer follows a [common attribute naming pattern](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html) that uses generic key names and index overloading. – fedonev Jun 28 '22 at 12:31
  • so with the `GSI1`, when I want to get all shipments for an account, then the query would be `GSI1PK` = `account#123` and `GSI1SK` begins_with `x_eta`, right? – billydh Jun 29 '22 at 11:51
  • 1
    @billydh Right! – fedonev Jun 29 '22 at 12:56
  • with this table design, it means most of the attributes should be made nullable, is that right? – billydh Jul 18 '22 at 10:39