3

I'm new to ETL process. At this moment I'm working with kettle more precisely with geokettle to make a ETL process. I'm building a transformation to insert data in a normalized structure on my database target from a descriptive table of a database source.

Please take into consideration the following example.

Example:

I'm trying to insert in cascade on my database a contact record. Notice that in the source database I have the following contact table:

Database Source: Table Contacts

I would like to transform into the following structure:

Database Target: [Contact normalized tables]

So the main idea is:

step one: Insert the phone in CI_Telephone table and get t_id.

step two: In parallel with the previous step insert institutionAddress and country on CI_Address and get a_id

third step: after that, i need to get those ids (t_id and a_id) to be inserted on CI_Contact (if not exist yet) and get the c_id

fourth step: Insert (if not already exists) on CI_ResponsibleParty the instituionName => organizationName and name=> individualname, and insert the correct foreign key (contactInfo) that references CI_contact table.

This was my idea about the ETL process:

Geokettle approach

Question: How can you do a "cascade" insertion using geokettle or kettle? ( i dont know if this is the correct term) Does this transformation make sense to you?

I couldn't find any similar example to this case. Any suggestion about this issue would be great. Please, share your experience about ETL process and how to normalize a database from another that isn't normalized.

Marlon Abeykoon
  • 11,927
  • 4
  • 54
  • 75
ePascoal
  • 2,362
  • 6
  • 26
  • 44

1 Answers1

2

These steps can be used for each step you specified.

step one: insert the phone in CI_Telephone table and get t_id.

  • Use Dimension lookup/update step. It will return the key created when you specified it in Technical key field.

step two: In parallel with the previous step insert institutionAddress and country on CI_Address and get a_id

  • Use the same step parallelly to do the Address insertion and get the Id.

third step: after that, i need to get those ids (t_id and a_id) to be inserted on CI_Contact (if not exist yet) and get the c_id

  • Use Merge join step to join the to streams using a key. Then use Insert/Update step to insert to CI_Contact if not exist.

fourth step: Insert (if not already exists) on CI_ResponsibleParty the instituionName => organizationName and name=> individualname, and insert the correct foreign key (contactInfo) that references CI_contact table.

  • Use the same step as above to do the insertion.

NB: Since Dimension lookup/update and Combination lookup/update steps are there for Data Warehousing which is suited well for your requirement.

Marlon Abeykoon
  • 11,927
  • 4
  • 54
  • 75