I have setup a time-series / events database using the AWS Firehose -> S3/Glue -> Athena stack. It is being used to track various user actions - session started, action performed etc. across a number of our products. My question is about how best to store different types of IDs in this system.
The existing schema is one big 'fact table' with a bunch of different columns. Two of the most important columns are event_type_id and object_id. To use StackOverflow as an example, two events might be:
- question_asked - in this case I would be storing the question id in the object_id column.
- tag_created - in this case I would be storing the tag id in the object_id column.
My question is - is storing multiple different types of IDs in the same column bad practice? It's working OK for us at the moment, but it does require the person/system performing queries to know what type of object the object_id column refers to, based on the event they are querying.
If bad practice, what other approaches might be better? Multiple columns where they are NULL if not relevant for the event in that row? Or is this where dimension tables would be a better fit?