0

How to delete first matching row in a file using a second one ?

I use Talend DI 7.2 and I need to delete some rows in one delimited file using a second one containing the rows to delete. My first file contains multiple rows matching the second one but for each row in my second file I need to delete only the first row matching in the first file.

For example :

File A :                      File B :

Code | Amount                 Code | Amount
   1 | 45                        1 | 45
   1 | 45                        3 | 70
   2 | 50                        3 | 70
   2 | 60
   3 | 70
   3 | 70
   3 | 70
   3 | 70

At the end, I need to obtain :

File A :

Code | Amount
   1 | 45
   2 | 50
   2 | 60
   3 | 70
   3 | 70

Only the first match in file A for each row in file B is missing.

I tried with tMap and tFilterRow but it matches all rows not only the first one.

Example edited : I can have many times the same couple code-amount in file B and I need to remove this same number of rows from file A

K4tn1x
  • 133
  • 3
  • 14

2 Answers2

1

You can do this by using Variables within the Tmap. I created 3:

  • v_match - return "match" if code and amount are in lookup file b.
  • v_count - add to the count if it's a repeating value. otherwise reset to 0
  • v_last_row - set the value of v_match to this before comparing again. this way we can compare current row to last row and get counts

Then add an Expression filter to remove any first match.

tmap

This will give the desired results:

workflow results

stickabee
  • 145
  • 8
  • Thanks, but what if I want to remove two same rows in file A ? For example if I add an other row "3 | 70" in file B, I have two same rows to remove in file A but this solution only works if I have a different code for each row I want to remove – K4tn1x Aug 21 '19 at 12:59
  • for that I would use a tAggregateRow on lookup file and then loop file A through a java component flagging rows to remove. Then filter on the flag. – stickabee Aug 22 '19 at 16:00
1

You can't delete rows from a file, so you'll have to generate a new file containing only the rows you want.
Here's a simple solution. enter image description here

First, join your files using a left join between A as a main flow, and B as a lookup.
In the tMap, using an output filter, you only write to the output file the rows from A that don't match anything in B (row2.code == null) or those which have a match, but not a first match.
The trick is to use a Numeric.sequence, with the code as an id of the sequence; if the sequence returns a value other than 1, you know you've already had that line previously. If it's the first occurence of the code, the sequence would start at 1 and return 1, so the row is filtered out. enter image description here

Ibrahim Mezouar
  • 3,981
  • 1
  • 18
  • 22
  • Thanks, but what if I want to remove two same rows in file A ? For example if I add an other row "3 | 70" in file B, I have two same rows to remove in file A but this solution only works if I have a different code for each row I want to remove – K4tn1x Aug 21 '19 at 12:50