I have big data events (TBs) I need to query and I am trying to partition it correctly.
I have client and each client has many games. The problem is there are fields we query for, that might be null in some events, therefore they cannot be used as partitions (for example: segment).
I thought about 2 strategies:
- partitions by: client/game/date (S3)
- different table per client or game, and partition only by date. different buckets.
option 1, is simple - and I filter in where clause. option 2, will require unions.
What is the correct way to partition such data? And by correct I mean most efficient and most cost effective?
Reagards, Ido