0

I've got this question:

I've got two sources in A and B and a Merge Join Step (in INNER option). the image shows what I am facing.

enter image description here

I am getting the right identifiers but with the value of the lastest row repeted n times for each one of them.

I need to get all the identifiers from B that are present in A.

I know there are also these options: Database Join y Database Lookup, but they could be kind of slow given that I have a lot of data to check

What component should I use to the get the expected result in Pentaho.

Regards.

d2907
  • 798
  • 3
  • 15
  • 45

3 Answers3

3

I could not replicate the issue.

The more probable errors are

  1. the input flow are not sorted,
  2. the first step (master) and second step (follower) are switched
  3. the key are not correct (a click on a drop box happens quickly)

Now, I think your goal it to filter out from B all the rows with an identifier not in A. I suggest to reverse the flow: for each row of B you lookup the identifier in A, and then filter out the identifier not found in A.

As a general rule, prefer the LookUp step. It is super fast and nearer the human way of thinking than the SQL joins.

If you need to grab more than one records for each input row, then use the Merge Join (and sort the input flows).

Avoid if you can database Join and Lookup for performance reasons.

AlainD
  • 6,187
  • 3
  • 17
  • 31
  • Hi @AlainD thanks for your replay. What I need is to get from B only those registers present in A. And the data from A is already sorted – d2907 Feb 02 '18 at 18:02
  • A picture was added to the response. For each row coming from B, you look in the step named `From A` a row with the `Identifer = Identifier` and return the field named `Identifier` which you rename `From A` (do not forget to specify its type). Then filter out the rows with `From A = null`. – AlainD Feb 03 '18 at 20:04
  • Of course it works. Usually the question with the kettle is not to know if it can do something, but will you mange to make it do what you want it to do. And, by the way, if it works, accept the answer. – AlainD Feb 08 '18 at 20:41
2

You are using INNER JOIN when you should instead use LEFT OUTER, i made a KTR feeding 2 data Grids exactly the way you described and i got you desired result.

I did use the sort rows step, sorting on the ID on both streams before the Merge Join step. enter image description here

Cristian Curti
  • 1,004
  • 1
  • 7
  • 13
-1

I think you haven't sorted the incoming data before Merge Join. Here is a video explaining what happens if you don't sort the incoming data before Merge Join in Pentaho.

Link: https://youtu.be/DBjqjXth-5E

  • Welcome to stackoverflow. Short answers with an external URL are considered low quality as the content of the external URL can change or break. Best practice is to include the key details of the link in your answer. Please see [answer] and update your post. – Simon.S.A. Apr 04 '20 at 08:01