0

I am new to GeoKettle (Spoon) of Pentaho and I am currently rows from an Excel-file into my database. Now I want to avoid duplicates in my databasetable. That is why I want to insert only those rows into my database table which aren't there yet (to have only unique records in my database table).

And as far as I know, there are two ways to realize that. The first way I tried was with the Insert/Update step (I have disabled the Update functionality) and defined all the columns which have to be equal in order to insert the record or not. But it does not work. All records are still inserted into the database.

That is why I am trying to do the (according to pentaho) much faster option which is a "Table Output" with an "Update" error handling step as shown in the picture. enter image description here

As shown in the picture, the arrow which is pointing from "table output" to "update" is black. But I need a red dotted one for error handling of step . But I do not know how to create this. In tutorials I often see that there pops up a little window with 2 options like in the picture: enter image description here

But I do not get that popup. If I want to create a hop, I will have to mark both steps and do a right-click on one of them. So in which possible ways can I create such a red dotted arrow? In the end, it has to look like this:

enter image description here

Thank you so much in advance!!

user7335295
  • 401
  • 2
  • 7
  • 31

2 Answers2

1

You have a problem with your setup. Or with your version of the PDI. The functionality of an error step was introduced in V4 but fully implemented for all steps around V6.

Download a fresh PDI from SourceForge. V7.1 is really a robust and stable edition. Unzip and test.

By the way, what you want to achieve is know as the CRUD pattern. CRUD for Create, Read, Update, Delete. The step doing this the Merge Rows (diff) (in the Joins family). You tell the steps which columns to check, and it produce a new column with the value identical, changed, new, or deleted. You can them redirect the flow in a Switch / Case to do the appropriate action. Further information here (V4).

enter image description here

AlainD
  • 6,187
  • 3
  • 17
  • 31
  • I tried your pattern as you suggested, but the functionality is not the same as I hoped it would be. As in your example, the records (which get imported by the csv file input), are merged with the table. As in my case: At first the table is empty, so all the records are new and get imported into the table. But if I shuffle the the records within the csv file and import it anew, there are not identical and imported again. (because the sequence of records is different) What I need is that each record of the csv is compared with all database records in order to find out if it is already there. – user7335295 Nov 22 '17 at 07:59
  • The Merge Diff requires the input to be sorted. More details here : https://stackoverflow.com/questions/47429769/pentaho-import-uniqe-records-into-database/47437110#47437110 – AlainD Nov 23 '17 at 13:56
0

Terms:

  • Source step: The step that the hop flows from.
  • Target step: The step that the hop flows to.
  • Hop: An arrow between two steps.

Starting process:

  1. Shift-Click the source step, then click the target step. That's what makes the little dialog appear where you can select "Error handling of step".

Alternative starting process:

  1. Connect your desired source step to your target step if they're not already connected. (At first, the arrow will be a regular black arrow.)
  2. Right-click on the source step. (Or use the context menu icon on mouseover.)
  3. Choose the "Error Handling..." option in the context menu.

Either way, now that you have the dialog you can probably guess from here, but before you actually get to see the red dotted arrow (since that was the explicit OP question), you also have to do the following.

Final process:

  1. In the dialog that appears, select your target step from the "Target step" drop-down list, and also mark the checkbox for "Enable the error handling".
  2. Optionally, fill in the other fields in the dialog.
  3. Click OK.

Notes:

  • Not every source step is capable of this kind of error handling, so in those cases, the "Error Handling..." option will be greyed out. That's normal.

  • You can access the "Error Handling..." option in the source step after deleting the target step, or even before creating the target step. If you do delete the target step, you may wish to remove its name from there.

  • After dealing with all this, remember that the error handling dialog is not the same as the options dialog for the source or target step, so don't forget to double-click them to fill them in, if you didn't already.

Nathan
  • 65
  • 9