I need your expertise in figuring out best possible option for storing data in database before it is being used in DWH/ETL or use it directly from source and do the ETL.
Current situation is we have transactional system running on PostgreSQL database. We have access to this database through linked server in our SQL Server where we have our data warehouse.
We would like to automate process of uploading data from this LinkedServer to Data warehouse. Following options / challenges we have where we would like your thoughts to help us!
We would like to have our ETL run daily in the night!
- Do we store data first from Linked Server into SQL Tables before we write multiple queries with multiple joins on those tables to prepare the data to upload in data warehouse?
- If we store data from LinkedServer to tables in SQL Server, I prefer to do truncate and load instead for incremental load from OLTP to Tables in SQL Server (1 to 1) for all those tables, since we are not able to get differential load from transnational system and people can go back and change data in transactional system and identifying records which are new and updated could be tricky as this point.
OR
we just use the source system through LinkedServer directly to write multiple joins and prepare data on the fly and load into some sort of pre-staging table? (With this option the issue that we are facing currently is when we write multiple joins directly on LinkedServer we get only 1 row, regardless of the total results/rows, but if we store all tables being used in these query join into individual temporary table and run the query from it , we get expected results) - Please let us know if someone has already experience this issue before and what was the solution for it?