0

If we use Joiner , then it is taking to much time. We have table A and Flat file B. A table has following fields Name , DEPT, SALARY. File B has following fields NAME and DEPT. We have to match the NAME in between table and file B and update DEPT field in File B on the basis of Value of DEPT present in Table A.

Table A
NAME    DEPT   SALARY
John    WSS    10000 
Micheal LSS    50000

Flat File B
NAME   DEPT
JOHN     
JOHN   
Micheal
Micheal

Output(After Updation) Table B
NAME    DEPT
JOHN    WSS
JOHN    WSS
Micheal LSS
Micheal LSS
Marek Grzenkowicz
  • 17,024
  • 9
  • 81
  • 111
  • You could use the Lookup transformation, but I think you should first determine why your current approach is so slow. How big are the objects? – Marek Grzenkowicz Feb 02 '16 at 10:41
  • There are 4 lakh records in table, While doing Joiner transformation these rows will become four times as 16 lakhs as there are multiple records in File B for the NAME field. Thats why its taking time. – hadoop_geek Feb 02 '16 at 11:00
  • That's not a lot. What RDBMS? Is the `NAME` column indexed? – Marek Grzenkowicz Feb 02 '16 at 11:38

2 Answers2

0

There is some ways to improve the performance in your case:

  1. In case both of your tables are located in same data base, you have to implement your join inside Source Qualifier. It's a most effective way.

  2. In case you want to use joiner transformation, you have verify, that the smallest input (smallest table) is marked as Master. It's also worth to sort the input and check "Sorted Input" option in your joiner transformation.

Lev
  • 41
  • 2
0

first import ur flat file b as a source

Flat File B
NAME   DEPT
JOHN     
JOHN   
Micheal
Micheal

then You need to use Lookup transformation on table A

Table A
NAME    DEPT   SALARY
John    WSS    10000 
Micheal LSS    50000

drag the name column source to look up transformation and check the look up condition table A name and flat file name name=name then drag name and dept in expression transformation then target

jack
  • 41
  • 3