1

I have the scripts to pull data from RDBMS and populate the Data warehouse. It works. I was wondering how Temp table comes in between. What are the steps to ETL exactly? Even though my data warehouse is populated, my teacher says that we need to use a temp table. why is it important?

Please help me. I am very confused right now. Thank you.

We need to pull the data from the databases of two different offices in two different locations. I will give the details of the table below.

lds_job_role

job_role_id:integer
job_role_desc:varchar
key_skill_1:INEGER
key_skill_2:INEGER
key_skill_3:INEGER
recommended_sal:INTEGER

lds_account

account_id:INTEGER
acc_name:VARCHAR
acc_postcode:VARCHAR

lds_placement

placement_id:INTEGER
plt_short_desc: VARCHAR
plt_required_start_date:DATE
plt_estimated_end_date:DATE
plt_actual_start_date:DATE
plt_renewal_no:INTEGER
plt_to_permanent:VARCHAR
max_salary: INTEGER
min_salary:INTEGER
actual_salary:INTEGER

mch_job_role

job_role_id:INTEGER
job_role_desc:VARCHAR
recommended_sal:INTEGER

mch_account

account_id:INTEGER
acc_name:VARCHAR
acc_postcode:VARCHAR

mch_placement

placement_id:INTEGER
plt_short_desc:INTEGER
plt_required_start_date:DATE
plt_estimated_end_date:DATE
plt_actual_start_date:DATE
plt_actual_end_date:DATE
plt_renewal_no:INTEGER
plt_to_permanent:VARCHAR
max_salary:INTEGER
min_salary:INTEGER
actual_salary:INTEGER
supervisor_name:VARCHAR

Below are the facts and dimensions of the Star schema:

job_roe_dim

job_role_id
job_role_desc

time_dim

time_id
year

account_dim

account_id
account_name

fact_accounts

Report_id
no_of_placements
salary
FK1_time_id
FK2_account_id
FK3_job_role_id

The exercise tells us to "deal with data quality issues, measures for FACTs, identifiers etc ". I THINK I have already done that with the scripts, but I do not know how to show it. Perhaps that is why the the temp table is required?

DazedNConfused
  • 189
  • 2
  • 13
  • 1
    You don't necessarily _need_ to use a temp table, but it might be useful depending on the characteristics of the table you are loading. Can you share some more details about the exercise, and I'll be able to give you a proper answer. – Ron Dunn Feb 03 '16 at 09:33
  • I will update the question to add a few more details. – DazedNConfused Feb 04 '16 at 03:07
  • 1
    Perhaps your teacher is confusing temp tables with _staging_ tables. Among other reasons, I use _staging_ tables to debug data flows. By no means do you 'need' to use a staging table. ETL actually means Extract-Transform-Load, which means you transform the data in-flow _without_ a staging table. There is another specific acronym "ELT" which means Extract-Load-Transform which means you extract data and load into a table _then_ transform it.. So this acronym specifically describes a process that uses staging tables.As usual the teachers doing the teaching don't know what they're talking about. – Nick.Mc Feb 04 '16 at 04:11
  • Thank you! Ill just insert the data into temporary tables and then copy it from the tables to my facts and dimensions. – DazedNConfused Feb 04 '16 at 08:55

0 Answers0