0

Question on Data Modeling

I am building a data warehouse on Redshift where I have fact data coming from multiple sources.

Source 1 - Has different facts and Dimension tables they are maintained well with integrity etc. Dimension tables have dimension attributes, fact tables have fact attributes, they are connected by foreign keys, Order table Product table Customer Table

Source 2- Has only one wide fact table, which collects all fact and dimension information. For example, it has only oner order table with all the order attributes, product attributes, customer attributes. This is manually entered data. There are no identifiers for products, customers, etc either.

There would be more source systems in the future I have no clue at this point how they are stored.

I have built a star schema in my data warehouse that can consume Source1 easily. Can I use this model to consumer source 2? Would you have a different design?

Avi
  • 11
  • 2
  • As the design of a data warehouse is independent of the structure of any source data, yes you can, in theory, use source 2 to populate your data model – NickW Mar 20 '22 at 20:10
  • @NickW how about dimensions data? Since dimension attributes are stored with fact at source 2, and there are no dimensions identifiers .. we could have same number of dimension records as fact as basically we will be splitting up records into different tables. – Avi Mar 21 '22 at 02:47
  • Whether you can map the data in source 2 into your existing data model, and how to technically achieve that, is something only you can know – NickW Mar 21 '22 at 06:43

1 Answers1

0

You seem to have the common challenge of integration - blending together data from multiple source systems.

There are a couple of well known data modelling paradigms that are usually employed to deal with this sort situation.

If your warehouse is needing to integrate deeply, ie potentially to entity level, then I suggest using the data vault approach.

If you only need to integrate on specific activity type - visits or sales - then I wold suggest the activity schema modelling approach.

Indeed, as previous commented have mentioned, the data modelling in the warehouse is usually independent of source systems - you should design for use cases and down stream applications.

beatbox
  • 21
  • 1
  • 5