0

I have a table with campaigns and multiple windows for every campaign with a start and end datetime.

pk sk start end data
CAMPAIGN#1 CAMPAIGN#1 {}
CAMPAIGN#1 WINDOW#1 2023-01-01T00:00:00.000Z 2023-01-31T23:59:00.000Z {}
CAMPAIGN#1 WINDOW#2 2023-02-01T00:00:00.000Z 2023-02-28T23:59:00.000Z {}
CAMPAIGN#1 WINDOW#3 2023-03-01T12:00:00.000Z 2023-03-31T23:59:00.000Z {}
CAMPAIGN#1 WINDOW#4 2023-04-01T12:00:00.000Z 2023-04-30T23:59:00.000Z {}
CAMPAIGN#1 WINDOW#5 2023-05-01T12:00:00.000Z 2023-05-31T23:59:00.000Z {}
CAMPAIGN#1 WINDOW#6 2023-06-01T12:00:00.000Z 2023-06-30T23:59:00.000Z {}
CAMPAIGN#1 WINDOW#7 2023-07-01T12:00:00.000Z 2023-07-31T23:59:00.000Z {}

There can be hundreds/thousands of windows for each campaign. What's the best/most efficient way to retrieve the window for a given date, meaning the date is > start and < end within DynmoDB without too much filtering?

I tried adding a GSI with a combination of start and end datetime as SK, but this does not really help me out.

gsi1pk gsi1sk
CAMPAIGN#1 WINDOW#2023-01-01T00:00:00.000Z#2023-01-31T23:59:00.000Z

When I want to query for 2023-06-07T13:00:00.000Z for example, I can query for all windows that start BEFORE this date and filter out on the end-date, but this only elimiates 1 row from my result set.

When I have thousands of windows in the past, this will be a very expensive query only to return with one result.

Is there a better way to set up my GSI / query?

2 Answers2

0

CAVEAT: DynamoDB is not a time series database and there may be better solutions out there. I'm making an attempt at giving you a solution but any solution with DDB will require some tradeoffs.


Assumptions (emphasis mine):

  1. The granularity you're looking for is at day level
  2. You want to find all matching windows across all campaigns that are active and that contain your target date (meaning the window starts before the target date and window ends after that date)
  3. There can be hundreds/thousands of windows per campaign, but not more then 10,000
  4. Window length is on the order of weeks/months not years..

You can create a table with the following schema:

 partition key:  cid (string)   // campaign unique id - eg. 'Campaign1'
 sort key:       wid (string)   // window id (unique within campaign) - eg. 'Window1'
 attribute:      wsm (string)   // window start month - eg. '202301'
 attribute:      wed (string)   // window end date -    eg. '20230128'
 attribute:      ttl (numeric)  // window expiration as timestamp in epoch seconds

NOTE Based on the assumption that you only care about active campaigns you will want to set a TTL on window expiration and remove expired windows from your table on the date of expiration. If you don't want to fully delete expired windows you can: use a secondary table for the lookup, or insert double records for each campaign window in your table to be used for lookup.

Example:

cid wid wsm wed start end data
Campaign#1 Window#1 202301 20230131 2023-01-01T00:00:00.000Z 2023-01-31T23:59:00.000Z {}
Campaign#1 Window#2 202302 20230231 2023-02-01T00:00:00.000Z 2023-02-31T23:59:00.000Z {}
Campaign#1 Window#3 202303 20230331 2023-03-01T00:00:00.000Z 2023-03-31T23:59:00.000Z {}
Campaign#1 Window#4 202304 20230431 2023-04-01T00:00:00.000Z 2023-04-31T23:59:00.000Z {}
Campaign#1 Window#5 202305 20230531 2023-05-01T00:00:00.000Z 2023-05-31T23:59:00.000Z {}
Campaign#1 Window#6 202306 20230631 2023-06-01T00:00:00.000Z 2023-06-31T23:59:00.000Z {}
Campaign#1 Window#7 202307 20230731 2023-07-01T00:00:00.000Z 2023-07-31T23:59:00.000Z {}
Campaign#1 Window#8 202308 20230831 2023-08-01T00:00:00.000Z 2023-08-31T23:59:00.000Z {}
Campaign#2 Window#1 202302 20230314 2023-02-14T00:00:00.000Z 2023-03-14T23:59:00.000Z {}
Campaign#2 Window#2 202302 20230601 2023-02-20T00:00:00.000Z 2023-06-01T23:59:00.000Z {}
Campaign#3 Window#1 202305 20230628 2023-05-15T00:00:00.000Z 2023-06-28T23:59:00.000Z {}

And create a GSI called intervals-gsi on PK:wsm, SK:wed

To find all matching windows you will need to run a number of queries depending on max length of your campaign windows. You will need to filter the results of the same-month query but given the partitioning there shouldn't be that many records to filter. The queries can be executed in parallel to speed up the process.

For example, let's say that you know that your campaigns can at most run for 6 months, given a target date of `09/17/2023' you can run the following queries using your GSI:

  • QUERY intervals-gsi WHERE wsm="202309" AND wed > "20230917" FILTER start <= '2023-09-17'
  • QUERY intervals-gsi WHERE wsm="202308" AND wed > "20230917"
  • QUERY intervals-gsi WHERE wsm="202307" AND wed > "20230917"
  • QUERY intervals-gsi WHERE wsm="202306" AND wed > "20230917"
  • QUERY intervals-gsi WHERE wsm="202305" AND wed > "20230917"
  • QUERY intervals-gsi WHERE wsm="202304" AND wed>"20230917"

Another example, let's say that your campaigns run for at most 3 months and the target date is 02/05/2024. You would run:

  • QUERY intervals-gsi WHERE wsm="202302" AND wed>"20240205" FILTER start < '2024-05-02'
  • QUERY intervals-gsi WHERE wsm="202301" AND wed>"20240205"
  • QUERY intervals-gsi WHERE wsm="202212" AND wed>"20240205"

Final notes

I've made some assumptions about your data. Feel free to tweak the assumptions and adjust the design to fit your needs. For example, if your campaigns don't run for more than 2 months each, you may be better off with a different partitioning granularity (eg weekly). The idea being you will trade # of queries for precision. Or if those hundreds/thousands of windows are really mostly on the order of tens/hundreds, optimize for that and let the outlier cases be handled less efficiently..

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
-1

Have you tried to use the BETWEEN Query Condition? By using between you will get only the data that matches your rule in your sort key for example, if you query the data between WINDOW#2023-01-01T00:00:00.000Z and WINDOW#2023-03-01T00:00:00.000Z you should get 3 results. This will also allow you to use filter conditions to remove the records lower than the end date (assuming your GSISK is using start date), you will still pay for the entire query, but you will likely get less records.

Now thinking about your access patterns, do you have the need to get all the Campaign information based on the Window number? The idea is to store the data in DynamoDB in a way that will benefit your access patterns, if you do need to have that information, please ignore this second part of the comment, but if you don't need that access pattern it might be a good idea to modify your partition key and move the start_date to the left in your SK in the base table, and you could avoid the use of the GSI. It will look like this 2023-01-01T00:00:00.000Z#WINDOW#1.

Hope it helps!