0

Is there a best practice to load the fact data from multiple sources in redshift? Redshift doesn't have a partitioning concept, it does however have the distribution concept. With distribution, one would need to use the DELETE operation which is costly, and also would need to VACUUM the tables to reclaim the space. The requirement is that the fact tables need to be truncated and loaded for a source. In oracle, we would have created a partitioned table with source and truncated partitioned before loading the data. Looking for the best practices for the DW.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Are you saying that you would like to 'swap' the contents of a table with minimal downtime? (For example, load a staging table via several processes, then make that data the 'live' table?) – John Rotenstein Apr 13 '21 at 13:01
  • That's not the intent, let's say you have different sources reporting you the sales for your product. The fact data hence is additive as the sources providing you the data at the same grain, just that the sources are not providing you the data at the same frequency. Traditionally we would have created a partitioned table by source and loaded the data from individual sources in parallel. What's the best practice in DW like Redshift and Snowflake? – Shaounak Nasikkar Apr 14 '21 at 20:05

0 Answers0