I'm trying to understand the best way to design my DynamoDB table. Let me introduce an example to explain my doubts.
Suppose I need to store customer invoices with the following fields:
- Invoice Unique ID (PK)
- Customer-ID
- Creation-time
- Modification-time
- Segment
- Invoice Reference number
The Invoice unique ID suggests using it as the partition key (no sort key is necessary). Based on the required data accesses, I'll have the following possible queries:
- Get an Invoice By ID
- Get all customer invoices by Customer-ID and Segment-ID
- Get an Invoice by Invoice Reference number
To achieve this, I have created two Global Secondary Indexes (GSI):
- Index_A on Customer-ID and Segment
- Index_B on Invoice Reference number
This table design seems reasonable, and everything works fine with these schemas and requirements.
Now, let's say I also want to store some additional Customer information:
- Customer-ID
- Segment
- Creation-time
- Modification-time
- Customer_email
- Customer_data
I'm trying to figure out how I can modify the existing table design to include this additional information. To query the customer-info data, I'll need the following possibility:
- Get customer data by Customer-ID and Segment
Regarding the customer info, the unique identifier will be the tuple <Customer-ID, Segment>. But, in this case, I could store this information by generating an additional unique ID as well. Therefore, I could keep the primary key (PK) as ID and include the following information:
╔════════╤═════════════╤════════════╤══════════════════╤═══════════════════╤═══════════════╤══════════════════════╤══════════════════════╗
║ ID │ Customer-ID │ Segment │ Reference-Number │ Customer-email │ Customer-data │ Creation-Time │ Modification-Time ║
╠════════╪═════════════╪════════════╪══════════════════╪═══════════════════╪═══════════════╪══════════════════════╪══════════════════════╣
║ 111111 │ Bob │ Automotive │ i-a-001 │ │ │ 2023-07-26T12:51:38Z │ 2023-07-26T12:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 222222 │ Alice │ Fitness │ i-f-002 │ │ │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 333333 │ Bob │ Fitness │ i-f-003 │ │ │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 444444 │ Bob │ Automotive │ │ bob@hello.com │ Bob data │ 2023-07-26T12:51:38Z │ 2023-07-26T12:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 555555 │ Bob │ Fitness │ │ bob@myemail.com │ Bob data 2 │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╟────────┼─────────────┼────────────┼──────────────────┼───────────────────┼───────────────┼──────────────────────┼──────────────────────╢
║ 666666 │ Alice │ Fitness │ │ alice@myemail.com │ Alice data │ 2023-06-26T11:51:38Z │ 2023-06-26T11:51:38Z ║
╚════════╧═════════════╧════════════╧══════════════════╧═══════════════════╧═══════════════╧══════════════════════╧══════════════════════╝
I'm not sure about this possible solution, but if it were reasonable, I would need a way to filter out the results using index_A (Customer-ID, Segment)
because it will return heterogeneous rows.
For example, with the table above, when I search using index_A on (bob, fitness)
, I'll receive invoices rows along with customer data. However, I also need a way to search by CustomerID
and Segment
to retrieve only invoices and another way to retrieve customer data by CustomerID
and Segment
.
Perhaps I need to distinguish the row type in some way. But if I add one more column, how can I use three fields on a GSI index?