0

I would like to store upwards of 100 time series in DynamoDB for each user. The data is aggregate monthly counts and is used for comparing monthly usage to permitted usage under varying subscription plans. Over 5 years, that's 100 x 5 x 12 = 6,000 data points per user. I need to be able to increment / decrement counts and query the data efficiently. In most parts of my app, I am updating 4-5 times series counts at any given time and only for the current period (month). I do not care about querying data across users. Representative data is as follows:

numberOfPosts: Int
numberOfImages: Int
numberOfContacts: Int
etc.

Pattern 1: Store all data in one table entry

type TimeSeriesData {
   period: AWSDate
   value: Int
}

type TimeSeries {
   owner_id: ID
   numberOfPosts: [TimeSeriesData]
   numberOfImages: [TimeSeriesData]
   numberOfContacts: [TimeSeriesData]
}

Pattern 2: Each user's time series is a table entry

type TimeSeries {
   key: String # a composite of "owner_id.name"
   owner_id: ID
   name: String
   data: [TimeSeriesData]
}

Pattern 3: Each time series data point is a table entry

type TimeSeries {
   key: String # a composite of "owner_id.period.name"
   owner_id: ID
   period: AWSDate
   name: String
   value: Int
}

The last pattern seems appealing, but it would require for 5 updates (5 separate GetItem's to read the current value and 5 separate WriteItem's). These could be sped up with batch. Still, given only 6,000 data points, and my basic usage requirements, am I better off using Pattern 1 ?

Is there another pattern / approach that is better for this situation?

Thanks!

EDIT

I now used a 4th pattern:

type TimeSeries {
   key: String # a composite of "owner_id.period"
   owner_id: ID
   period: AWSDate
   attribute1: Int
   attribute2: Int
   attribute3: Int
   ...
   attribute100: Int
}

The benefit of this is that since most of the time I am updating 4-5 metrics for 1 period (typically the current month), I can make the update with only 1 GetItem and 1 PutItem call, whereas, this would cost 5 and 5 with Pattern #3. Also, with 100 metrics per period, 15 characters average attribute name, my document will be under 2KB, so the costs of each read/write are 1 RCU and 2 WCU, compared to 5 RCU and 5 WCU with Pattern #4. With latter, to update 5 metrics (typically 10-15 bytes), I'm spending 5 WCU because Dynamo has a minimum charge of 1KB. I guess since I'm paying for the 1KB, my data model might as well get the most of it by packing as much data into that operation.

For those interested, Pattern #1 is the worst. By my estimate, each read/write update is about 165KB, which costs 42 RCU and 166 WCU, which when factoring $0.25/RCU and $1.25/WCU makes it 79 times more expensive!

Poala Astrid
  • 1,028
  • 2
  • 10
good1492
  • 161
  • 1
  • 10

1 Answers1

1

In your situation, I'd go with 3rd option. About the issue with updates. That's just one Query based on key prefix and a batch update. Performance/cost is great here with no downsides.

With the first approach you might hit 400kB object size limit and that will force you to remodel

karjan
  • 936
  • 1
  • 7
  • 17
  • Thanks! I ended up going with a 4th pattern putting each period into 1 document since that matches my typical access pattern and is cheaper. – good1492 Jul 04 '22 at 13:51