0

I am using Pentaho Data Integration Kettle and SQL Server and have already made a Data Warehouse, but there is something I don't understand. When I update a record on a dimension it appears as if two records were updated, although it only updated one as it should.

First I extract from CSV file, then select and rename values, then look up for another dimension surrogate key and then lookup/update dimension with punch through. (SCD Type 1)

Pentaho ETL transformation

After updating just 1 record:

Pentaho duplicate in Log

Why?

Thank you!

KronosL
  • 299
  • 3
  • 4
  • 11
  • Is your last Dimension L/U step doing a "punch through" update? If so, then any row of data from the main stream can trigger multiple updates on the dimension table. – nsousa Mar 13 '18 at 12:14
  • Yes it is doing a punch trough. Why does that happen? Still don´t understand – KronosL Mar 13 '18 at 18:38
  • In a punch through, ALL versions of the record will be updated if an attribute changes. So, if you already have v1 and v2 on the table and a new version occurs, both rows will be updated on the DB; the count of updated rows refers to the number of DB updates, not the number of rows of data in your data stream. – nsousa Mar 18 '18 at 16:39
  • Nsousa yes if there are 2 versions that will happen, but thats not the case. There is only 1 version of all records on that table. In fact, punch through is used to overwrite the record, so what you are saying still doesn't explain why there are 2 updated rows. – KronosL Mar 19 '18 at 17:44
  • and only 1 row of data triggers an update? No two records there? clearly PDI is telling you there were two rows updated in the database. You may want to look into a more verbose log level to understand what PDI is doing, but you have plenty of information here to drive your investigation. – nsousa Mar 20 '18 at 10:59

0 Answers0