4

I'm having problems designing my DynamoDB table to support a fairly simple access pattern. I hope you can help me a little bit :)

I have 4 different products types (A, B, C and D) that have a price and a location (country#state#city).

The access patterns are:

  1. Filter by product type
  2. Filter by product type and location
  3. Filter by product type, location and price
  4. Filter by product type, location and price and sort by price

The problem is that a product located in USA#NY#NY must be also available in USA#NY and USA. Also, the user needs to be able to filter by price and sort by expensive/cheap first.

Example:

Data:

Product: ID_1 | Type A | Location USA#NY#NY | Price 100$
Product: ID_2 | Type A | Location USA#NY#NY | Price 200$

Use case:

  • A user U1 search for products type A with price < 500$ in USA#NY#NY expensive first
  • A user U2 search for products type A with price < 250$ in USA#NY cheap first

Expected result:

  • U1 should get ID_2, ID_1
  • U2 should get ID_1, ID_2

Both products should be displayed to both users in the right order even though they search in different areas.


To be able to filter by location and price and sort by price I came up with this solution, however, a lot of data gets duplicated and I am sure there must be a much better solution:

PK |     SK & GSI PK     | GSI SK | Other product details (duplicated data)
---------------------------------------------------------------------------
ID | TYPE                | PRICE  | Image, name, etc
ID | TYPE#USA            | PRICE  | Image, name, etc
ID | TYPE#USA#NY         | PRICE  | Image, name, etc
ID | TYPE#USA#NY#NY      | PRICE  | Image, name, etc

This solves every access pattern:

  1. Filter by product type

    GSI PK = TYPE

  2. Filter by product type and location

    GSI PK = begins_with(TYPE#USA#NY#...)

  3. Filter by product type, location and price

    GSI PK = TYPE#USA & GSI SK > 150

    GSI PK = TYPE#USA#NY & GSI SK > 150

  4. Filter by product type, location and price and sort by price

    GSI PK = TYPE#USA & GSI SK > 150 ScanIndexForward true/false

    GSI PK = TYPE#USA#NY & GSI SK > 150 ScanIndexForward true/false

Reads are efficient but a lot of data gets duplicated (price and product details) and updating an item requires multiple writes.


Is it possible to achieve this without duplicating all the product details?

user1204395
  • 568
  • 1
  • 4
  • 16

1 Answers1

3

I think you're misunderstanding the hierarchical pattern

You only need
Table
PK = ID

GSI
PK = TYPE
SK = COUNTRY#STATE#CITY

You can then query the GSI with

  • Query(GSI, PK = 'TYPEA')
  • Query(GSI, PK = 'TYPEA', SK begins with 'USA#')
  • Query(GSI, PK = 'TYPEA', SK begins with 'USA#NY#')
  • Query(GSI, PK = 'TYPEA', SK begins with 'USA#NY#NY#')

Filtering on price could be added to any of the above queries.
Query(GSI, PK = 'TYPEA', SK begins with 'USA#NY#NY#', filter price > 100.00)

Note that filtering this way does not save any read capacity, it may be more effective to simply filter client side.

The point is you only need (and are allowed) 1 row in the GSI per row in the table

The only question, do you expect more than 10GB of data for any of the types? A GSI, like DDB table itself has a 10GB limit per partition.
(2022-11 Update) DDB supports more than 10GB of data for a given partition key, if there are no Local Secondary Indexes. Additionally, GSI indexes do not have the 10GB partition limit.

If you do expect more than 10GB per type, I'd reconsider the idea that type is a valid access pattern. Nobody is going to scroll through 10GB of data.

Honestly, I'm not a fan of DDB tables that don't handle at least one of the anticipated access patterns. If type is that big of a driver, I'd consider a table like so:
Table
PK = TYPE
SK = ID

LSI
PK = (same as table)
SK = COUNTRY#STATE#CITY

Now your queries become

  • Query(table, PK = 'TYPEA')
  • Query(LSI, PK = 'TYPEA', SK begins with 'USA#')
  • Query(LSI, PK = 'TYPEA', SK begins with 'USA#NY#')
  • Query(LSI, PK = 'TYPEA', SK begins with 'USA#NY#NY#')

and you don't have to pay extra for a GSI.

EDIT
When considering the price filter, is it really going to make a big difference in performance or cost. You pay for each 1MB RCU of data regardless of rather that data returned is 1 row or 100 rows. So just how big are your rows and how big a spread in price to you expect to filter across?

Have you considered other options besides or in addition to dynamo? Aurora RDS certainly would provide the flexibility in query that you seem to need much, much easier. Or perhaps adding Elasticsearch in addition to DDB.

Charles
  • 21,637
  • 1
  • 20
  • 44
  • I think I am not understanding how "Filtering on price could be added to any of the above queries". Let's say I have a product with ID 1, type A and price $150. A user is looking for products type A in USA#NY# between $100 and $200. Another user is looking for products type A in USA# between $140 and $160. The product with ID 1 should appear in both searches. How can I add the price to your query? Thank you very much for your answer :) – user1204395 Aug 01 '19 at 16:03
  • @user1204395 filtering is done on any query by adding a [filter expression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.FilterExpression) to the query. The filter expression is separate from key condition expression. – Charles Aug 01 '19 at 16:37
  • Ah, ok, I didn't mention I don't want to add any filter expression on price because it is the most common filter and I want to make it cheap and efficient, my bad! I'll edit my question – user1204395 Aug 01 '19 at 16:48
  • 1
    @user1204395, reconsider using dynamo...flexible queries are not it's strong suit. – Charles Aug 01 '19 at 17:11