0

I'm working on a data storage model for a clickstream analytics system. User action data comes from a third-party system as a set of large JSON files. Currently, we will have an ETL process to read JSON files as a source and save data into our store for future analysis and reporting.

Depending on some business rules of the source system, each event can have an is_success field set to true or false. Non-successful user actions have a JSON field with an array of nested objects with diagnostic data about failures.

The draft data model for the storage system is the following:

enter image description here

I have concerns about the relation between fact_events and dim_failure_details on the diagram above. To me, dim_failure_details does not look like a dimension because it has a many-to-one relationship to the fact table.

I've read a design tip from the Kimball Group. That article recommends using a bridge table in a similar situation. But I don't understand how to apply that solution in my case because each event can have different and unpredictable values for attribute_key and attribute_value even when failure_type is the same for multiple events.

I also saw a few similar questions (Star schema [fact 1:n dimension]...how?, Star schema [fact 1:n dimension]...how?), but still don't know how the relationship should be organized correctly. Any help will be much appreciated.

Oleksii
  • 294
  • 1
  • 5
  • 12
  • 1
    A star schema design needs to be based on your reporting requirements, not the structure of your data. Once you know what you need to be able to report on, you can define the grain of your fact table(s) and therefore the Dims associated to it. From your description, it sounds like events and event failures would be different facts – NickW Feb 17 '23 at 18:56
  • @NickW Thank you for your response. Unfortunately, we are at that early stage of the project when no one can answer what they want. Thank you for the hint. I haven't thought we could have a separate fact table for failures. – Oleksii Feb 17 '23 at 19:38
  • I also found a very similar problem here: https://stackoverflow.com/questions/66550402/how-to-accommodate-two-dependent-fact-tables-in-the-same-star-schema – Oleksii Feb 17 '23 at 19:43
  • You have not explained what does dim_failure_events represents, in business terms. – CleanBold Apr 17 '23 at 17:23

0 Answers0