Pentaho: execute insert only if there are no duplicates

Question

Basically I want to insert a set of rows only if there are no changes from the target row.

I have implemented a blocking step to wait for all rows to be processed before proceeding. After this I want to add a condition to check if there are any changed data and if there are any abort the process else insert all rows.

Any suggestions?

MAX · Answer 1 · 2015-12-22T14:13:37.217

This seems to be very easy with only 2 steps

Try this: Step 1 : Use Database lookup step, look up on the key columns, And retrieve the columns you want compare including key fields in the target table for the duplicates.

Step 2: Use Filter Step, Here compare all the field which you have retrieved from the db look with the stream / table / source input input. like id (from source input) = id (from target) and name (from source input) = name (from target) , false condition point to Target table and true to dummy for testing.

Note: If you want populate table key max + 1 then for Combination lookup and update step instead of the table output

score 0 · Answer 2 · answered Mar 10 '15 at 18:38

0

If I understand your question properly, you want to insert rows if they are identical to the rows in target? Would that not result in a PK violation?

Anyways from your code screen shot, you seem to have used a Merge Rows(Diff) step which will give you rows flagged with a 'new', 'changed', 'identical' or 'deleted' status. From here you want to check for two things: Changed or Identical

If it is changed you have to abort and if it is identical you will insert

Now you use a simple filter step with the status = 'identical' as the true condition (i.e.) in your case to the insert flow

The false condition would go to the abort step.

Although do note that even if a single row is found to be changed the entire transformation would be aborted

answered Mar 10 '15 at 18:38

ash4stacked

21
1

No. I want to insert rows if there are no identical rows. The source of this is a different table so it will only insert if there is are new rows. Your last statement is what i want to happen. If there is at least 1 deleted or changed row then it should not insert. What is currently happening is it still inserts some rows. I want it to check first before inserting. – DRTauli Mar 11 '15 at 04:04
Well if you want it to check first before inserting then you could check for any changed or deleted row in the filter step and when both of them return false you can then trigger your source table to insert. BTW can you post some sample data as its not clear when you say that you want to insert rows when there are no identical rows but at the same time check for changed or deleted rows and abort the transformation – ash4stacked Mar 15 '15 at 18:44

score 0 · Answer 3 · answered May 12 '15 at 14:03

If I understand your use case properly, I would not use the "Table output" step for this kind of move. "Table output" is a great step for data warehousing, where you usually insert data to tables which are supposed to be empty and are part of a much broader process.

Alternatively, I would use "Execute SQL script" to tweak the INSERT to your own needs. Consider this to be your desired SQL statement (PostgreSQL syntax in this example):

INSERT INTO ${TargetTable}
    (contact_id, request_id, event_time, channel_id)
VALUES ('?', '?', '?', '?')
WHERE
    NOT EXISTS (
        SELECT contact_id, request_id, event_time, channel_id FROM ${TargetTable} 
        WHERE contact_id = '?' AND 
        -- and so on...
    );

:

Get the required fields for mapping (will be referenced by the question marks into an argument sequence);
Check the "Variable substitution" check box in case you intend to use variables which were loaded and/or created along the broader process;

step snapshot

SQL-performance-wise, it may not be the most efficient way, but it looks to me like a better implementation for your use case.

jacktrade · Answer 4 · 2016-01-08T21:16:00.443

0

the simplest way to do that is to use the insert/update step. not need to make any query: if the row exists it updates, if not exists it creates a new row.

edited Jan 08 '16 at 21:16

answered Dec 24 '15 at 12:06

jacktrade

3,125
2
36
50

Pentaho: execute insert only if there are no duplicates

4 Answers4