0

I have two csv file . In one file i have 10 rows and in another list of data . What i want to do is , check the data of one filed of first csv and compare it with another csv file . So how can i achieve this ? Any help would be great .

AlainD
  • 6,187
  • 3
  • 17
  • 31
Abhishek
  • 21
  • 8

1 Answers1

0

The step you are looking for is named the a Stream Lookup step.`

Read you CSV and the reference files, and drop the two flows in a Stream Lookup and set it up as follow: a) Lookup step = the step that reads the reference b) Keys / field = the name of field of the CSV that contains any field able to identify the row in the reference file. c) Keys / Lookup field = the name of the field in the reference file. d) Field to retrieve = the name of the field in the reference to return (may be the identifier or any other field you need) e) Field to retrieve / Type = Do not forget !

Like that, you will add a column from the reference file to the 10 rows of the CSV file. You may then filter out the rows which the Lookup did not found by testing if the value of the new column is not null.

As in the PDI all the above setup are guided with drop down lists, it should take you 2 minutes.

enter image description here

AlainD
  • 6,187
  • 3
  • 17
  • 31
  • The fields are different in both the CSVs . First CSV file has list of columns and shortName for country . Second CSV file has two columns like "shortName" and "OriginalName". What i want to do is compare that country column from first csv with second csv's shortName and if it matched then replace that with originalName in first csv – Abhishek Apr 25 '18 at 09:29
  • With the setup as in the (updated) image : for each row of the CSV, it will lookup for the row in the Reference with CSV.country = Reference.shortName. If found, it will add the Reference.originalName in a new column CSV.originalName. You can change the name of the new column. – AlainD Apr 26 '18 at 13:08