1

I've got an Excel-Spreadsheet where:

  1. Column A is filled with the names of target fields (Table field)
  2. Column B provides names of the source fields (Stream field)
  3. Column C tells you whether to update the field or not, i.e. Value is either "Y" or "N"
  4. Column D & Column E provide the primary key of the target and the source field, respectively.
  5. Number of rows: Over 500 thousand

Now, I want to send these values to the "Insert / Update"-step automatically, i.e. do NOT want to copy&paste; PDI should pull everything.
Looked at "ETL Metadata Injection" and think that I could easily pull Columns A to C. But I simply don't know how to pass the primary key(s); let's put differently, I don't know how to tell the step to use certain values as a primary key.
Can work with CSV file input or Text File Input as well if it makes things easier.

Any help is appreciated. Thanks in advance!

notAnExpert
  • 123
  • 1
  • 5
  • 15

1 Answers1

0

i am not sure if you can achieve that as meta data injection keeps a generic file structure and you cannot define if the column is primary column or not. do not use primary key validations at this stage as the process will be complicated for debugging and error handling. Push the information in staging tables and then pen a logic to populate the same main table. this will save a lot of your effort.

NK0709
  • 56
  • 4
  • Thanks for your idea! FYI: Every source table has got its own target table, there is no main table. Why? I am pulling salesforce-module-rows, i.e. cannot mix up, say, rows from the Account-module with one from Lead. – notAnExpert Jan 03 '20 at 16:29
  • If I understand it correcly, you want me to build a table in between source and target table, right? My first idea was to replace the "Insert / Update" by a SQL statement/script/whatever but I am afraid that the job would run very slow, if at all, if the statement runs through 500k rows with plenty of columns. – notAnExpert Jan 03 '20 at 16:33
  • with huge number of records and columns never use insert/update step. with my experience use the bulk insert steps or output steps to populate staging and then run the conversions and population via SQL statements for quicker processing. you can leverage indexes in order to add further performance in sql scripts. – NK0709 Jan 06 '20 at 03:22