DynamoDB table design for heterogeneous data

Question

I'm trying to understand the best way to design my DynamoDB table. Let me introduce an example to explain my doubts.

Suppose I need to store customer invoices with the following fields:

Invoice Unique ID (PK)
Customer-ID
Creation-time
Modification-time
Segment
Invoice Reference number

The Invoice unique ID suggests using it as the partition key (no sort key is necessary). Based on the required data accesses, I'll have the following possible queries:

Get an Invoice By ID
Get all customer invoices by Customer-ID and Segment-ID
Get an Invoice by Invoice Reference number

To achieve this, I have created two Global Secondary Indexes (GSI):

Index_A on Customer-ID and Segment
Index_B on Invoice Reference number

This table design seems reasonable, and everything works fine with these schemas and requirements.

Now, let's say I also want to store some additional Customer information:

Customer-ID
Segment
Creation-time
Modification-time
Customer_email
Customer_data

I'm trying to figure out how I can modify the existing table design to include this additional information. To query the customer-info data, I'll need the following possibility:

Get customer data by Customer-ID and Segment

Regarding the customer info, the unique identifier will be the tuple <Customer-ID, Segment>. But, in this case, I could store this information by generating an additional unique ID as well. Therefore, I could keep the primary key (PK) as ID and include the following information:

╔════════╤═════════════╤════════════╤══════════════════╤═══════════════════╤═══════════════╤══════════════════════╤══════════════════════╗
║ ID     │ Customer-ID │ Segment    │ Reference-Number │ Customer-email    │ Customer-data │ Creation-Time        │ Modification-Time    ║
╠════════╪═════════════╪════════════╪══════════════════╪═══════════════════╪═══════════════╪══════════════════════╪══════════════════════╣
║ 111111 │ Bob         │ Automotive │ i-a-001          │                   │               │ 2023-07-26T12:51:38Z │ 2023-07-26T12:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 222222 │ Alice       │ Fitness    │ i-f-002          │                   │               │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 333333 │ Bob         │ Fitness    │ i-f-003          │                   │               │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 444444 │ Bob         │ Automotive │                  │ bob@hello.com     │ Bob data      │ 2023-07-26T12:51:38Z │ 2023-07-26T12:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 555555 │ Bob         │ Fitness    │                  │ bob@myemail.com   │ Bob data 2    │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 666666 │ Alice       │ Fitness    │                  │ alice@myemail.com │ Alice data    │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╚════════╧═════════════╧════════════╧══════════════════╧═══════════════════╧═══════════════╧══════════════════════╧══════════════════════╝

I'm not sure about this possible solution, but if it were reasonable, I would need a way to filter out the results using index_A (Customer-ID, Segment) because it will return heterogeneous rows.

For example, with the table above, when I search using index_A on (bob, fitness), I'll receive invoices rows along with customer data. However, I also need a way to search by CustomerID and Segment to retrieve only invoices and another way to retrieve customer data by CustomerID and Segment.

Perhaps I need to distinguish the row type in some way. But if I add one more column, how can I use three fields on a GSI index?

Typical way to include row type in [Single Table Design](https://www.alexdebrie.com/posts/dynamodb-single-table/) is via a prefix, for example a pk of `customer#121` and then an sk of `order#001` or `profile#121` etc. — jarmod, Jul 26 '23 at 20:09

DynamoDB table design for heterogeneous data

0 Answers0