0

I'm trying to delete records in my target table based on whether the records exists in the source table. I tried using the 'Delete' step but then realized that this step is based on a conditional clause.

My condition is quite simple "if the record/row DOES NOT exist in table A [source] delete the record/row from table B [target]".

I also read about using the the 'Merge Rows (diff)' step, but that seems to scan/compare the entire set of tables for differences.

The table is several million records with many hundred columns on a MySQL server, I need to perform this in the most efficient manner.

Any help would be appreciated.

WRKM
  • 1
  • 3
  • I will try to help you with this. What do I need to do to set it up – Drew Aug 12 '16 at 21:57
  • There was a different pentaho kettle mysql question several hours ago. Or maybe it was a kettle spoon thing – Drew Aug 12 '16 at 21:58
  • Thanks in advance. You need to establish two tables (should be identical), doesn't matter what's in them for the exercise. One in your source database environment, the second in your target environment. In your source table, remove a row. I need a job or transformation that will delete the missing source record from the target table. Does this provide enough information? – WRKM Aug 12 '16 at 22:05
  • I mean from the pentaho kettle side. I have neither but am willing to try something new – Drew Aug 12 '16 at 22:07
  • As an aside, there are always events to clean things up if that is an option for you. I have 3 event-ish links hanging off my profile page – Drew Aug 12 '16 at 22:08
  • Thanks. I can actually do this fairly easily, as a one-off in SQL, I need something a bit more dynamic, so looking to leverage the ETL tool set. In terms of what needs to be done, you have to download/install Kettle/Spoon, create two schemas (source and target schema) in your MySQL server. Each schema should have the same table . . . now simply delete one record from your source table (you can just do this in Workbench or something) . . . go to Kettle and setup a transformation that will delete the corresponding record in your target table. – WRKM Aug 12 '16 at 22:15
  • Delete step has conditional clause to check if exists such row in table. Or exists clause is then same as conditional clause of Delete step. – simar Aug 15 '16 at 14:24
  • Define first of all what exists means to u. How u can identify existing rows. – simar Aug 15 '16 at 14:25
  • @simar/et al. , I'm now using Merge Rows (diff) with success (except for the net_write_timeout error! – WRKM Aug 17 '16 at 21:50
  • There are several pattern to implement u needs. This is another pattern to find non existing row. Do outer join data stream with database and extract for example id from database as marker of existence row. Then just use filter to look for id value and then make decision what to do with divided stream. Good that u can join to Log Step and make test to ensure that flow works as expected. – simar Aug 18 '16 at 07:25
  • Anyway good that u able to solve problem. – simar Aug 18 '16 at 07:26

0 Answers0