0

Need best practice advice on how to define PK (hash+range) for this schema:

  • date
  • org
  • service
  • tps

Usage is to produce a daily report of peak tps for each service for a particular org. So date+org+service will be unique, ie, any upsert of this combo should

  • create new record if it doesn't exist
  • update tps value if it already exists

Search pattern will be on any given date for a specific org, return of all services with their tps.

I'm considering using date as the hash key. Then the range key will have to be a concatenation of org+service. Is this the right approach? If so, what should be used as the delimiter, to avoid it clashes with the values of org/service?

And how should I setup the index?

Tech Newbie
  • 51
  • 1
  • 5

1 Answers1

0
  1. When it comes to a delimiter I haven't seen a "best practice". Some library used dots, frequent examples use hashtags or hyphens. If you are not "decoding" the value on read, you shouldn't have problems. Otherwise it would be convenient to use something that doesn't conflict with your values. I would say - pick one and stay consistent with it.
  2. Your example is quite small, in that you defined only 1 access pattern (AP) and uniqueness of your record. As such, there is no "best practice" to facilitate it, because design depends on other access patterns that your data might face. Think of what other queries you might want to do, prioritize them and then try to design the model. Keep in mind that given priorities and query frequency you can facilitate access patterns via precise query, query with filter or a scan with filter, which might affect the number of secondary indexes you want.

Hints:

I assume your date is year-month-day and not something like ISO-8601 timestamp. If you use date as a hash key, in addition to your AP, you will be able to get overview for a specific day across your organizations/services. But if you would put date as a sort key and org as hash, you would be able to get a history of tps for a specific org. Question here would be where to put service?

  1. If you put service into sort key before date, then getting a specific time-window across services will be impossible, since this doesn't facilitate your AP you probably don't need this.

  2. If you put it after a date, then you would be able to get day view (the AP that you are after) for a specific org across services and you would also be able to get a monthly or yearly view (with Begins With functionality) for that org across its' services.

  3. Another option is to keep a mapping of service to tps in your org-date ddb item. This has an impact on writes. If you create this service tps item from one place, then there shouldn't be problems. If on other hand you have asynchronous elements writing service tps for your org, then it can be handled via conditional updates/writes (you read the item and then you check upon update/write that item is the same/doesn't exist, if it is not you read it again and repeat). This also depends on how many services you expect to have, since there is a 400KB limit on an item, thus can you really store this mapping?

  4. Having org-service or service-org as partition key doesn't work for your AP.

  5. Another idea is if you don't store that many days of tsps per org/service, you could make org as PK and service as SK and store date-tsp in a list, map or a set.

My opinion: I would go with 2nd approach since it is more intuitive for me to use date in a sort(range) key and somehow I feel that providing that historic view per org might be useful. In addition, I can add an index (PK:service, SK:date) to give a breakdown per service across orgs if need be. Additional APs could be facilitated via secondary indexes. In case you need a holistic view with 2nd approach, there can be another index (PK:year-month, SK:service or org or service-org or day or some other combos depending how you want to narrow it down after a PK).

Last tip: Because you have time-series in your DDB, consider using TTL for cleanup. Do you need these records after a year or five? Do you store a copy of this data in S3 or somewhere else for potential historic analytics - if yes how much do you need of it in DDB?

Cell
  • 173
  • 2
  • 11