I am quite new to Pentaho Spoon and I would like to import records of an csv file to an database table. However, only unique records should be imported into the database table. That is why I need to compare EACH record with all records of the database table in order to determine if the record should be imported or not.
So far, I tried out the suggested CRUD-pattern which looks like this:
As you can see in the picture, I merge the excel input and the table input (ignore the cast-steps. I needed to cast a value because ther differed in the float format: database format was #.000000 and the csv format of float was #.0)
After the merge join, I compare the flag (which is given by the merge rows(diff) and if the compared records are new, I import them to the database table, if they are changed, I update the record and if they are deleted or identical, I simply do nothing. So far, so good.
But here is the problem: If I shuffle the records of the csv-input-file and run the transformation anew, all the records are imported anew and consequently, there are duplicated in my database table (which I wanted to avoid). To emphasize again: The right way to solve this is that each row of the csv-input-file is compared with ALL entries in the database table.
How can I realize this? Any suggestions? Thank you so much in advance!!