1

I am a beginner to DataWarehousing. We have created a data mart, a star schema design to load quarterly data. We have been loading the current data as and when approved by the business for that quarter.

Now we have a requirement to go back and load historical data (for 3 years which is around 40GB). The dimensions for loading this data will be the same ones used for qaurterly load. However, can we load this historical data into the same fact table or do we have to create a duplicate fact table to load the historical data alone? Is that a DW standard? I am trying to find the ways to do this as per the standards.

The current fact table is date partitioned on load_cycle_date which specifies the quarter the data was loaded for.

Thanks much!

user3357452
  • 11
  • 1
  • 2
  • DWs contain historical, aggregated data, what's your concern? – rano Feb 26 '14 at 19:39
  • Hi RanoMy concern is if historical data (consolidated for 3 years) can be loaded into the same fact table that we have been using to load quarterly data. – user3357452 Feb 26 '14 at 19:47
  • Suppose you only insert quarterly data from now for the next three years, what would change? – rano Feb 27 '14 at 06:48
  • Thanks for your response! Inserting new data on a quarterly basis will be fine. But I was wondering how to insert historical data (3 years combined and is not broken down by quarter). Can I use the same fact table which is now used to load qarterly data or should i create a new fact table to load just the historical data and map it to the existing dimensions? – user3357452 Feb 27 '14 at 15:12
  • you have to specify how much summarized the historical data is and what is the schema with your fact table now – rano Feb 27 '14 at 17:57

1 Answers1

0

I don't see why getting historical data and using older load_cycle_dates won't fit in your existing table. This assumes you're able to transform them into this format. This is based on how much the data structures have changed over the years.

There are other areas you need to look into:

  1. Do you have adequate historical values for all your dimensions? Example: Client Rating. There may be clients who ended up with a "Bad" rating, but that wasn't the case previously. There would need to be records for each change. The alternative would be to pull the data from backups.
  2. Approval Process - Often a lot of data discrepancies aren't discovered until this is started. As a result of this, there may have been changes to the app that makes these corrections. You may find that some reports run as of this "prior data warehouse" data, will not be accurate.

There's no reason you shouldn't be able to do this for one quarter and test it out. It's the only way you'll know for sure. The current data warehouse I work with went through the same process of adding data before the warehouse was started. Conversions are very common.

JeffO
  • 7,957
  • 3
  • 44
  • 53