0

I am loading CSV files from a folder using Pentaho, and once files are loaded, I am making an entry into a table with the filenames that are loaded.

I need to put a check before loading a file if it is already loaded, for that I want to pick the filename and check with the names in the table that holds files which are already loaded. Since I am new to Pentaho, I am struggling to design this approach.

Please, suggest how should I go through to do this or if there is any totally different approach.

Yurets
  • 3,999
  • 17
  • 54
  • 74
  • Can you post what you have tried so far? (code) – KeyMaker00 Jun 21 '18 at 07:29
  • Hey @KeyMaker00 , as I mentioned, I am thinking of using a GetFileName to read the filename into a variable and then in SQL I will call a stored procedure that will compare this filename with the files entries in the log table and will return output parameter say if it's value is 1 means file already exists or 0 if file is new and can be loaded in a table. I was wondering if there is any better approach. I am not sure if I can add any code as such for this ! Cheers – sachin vaidya Jun 21 '18 at 09:33

1 Answers1

0

Your approach is valid. Make some book keeping of the processed filename in a database (you may also use a CSV file for that).

The difficulty with this approach is that the filename may not be in a field. So you have to write a master job to Add file name to results and give hand to a transformation that load the CSV (Press crtl-space in the box and find your variable in the drop down), check the database, with a Stream lookup, and Filter rows that are not matched. After the load, you 'Update' the bookkeeping table.

An other approach we used successfully in the past was to load the file form a directory and move the processed file into an other directory. This way it was easy to drop new files into a directory, and to retrieve processed file in case of problems.

This could be a start:

  • The Job enter image description here
  • The transformation enter image description here
AlainD
  • 6,187
  • 3
  • 17
  • 31
  • hi AlainD, thanks for your inputs. Actually, I was able to load a single file successfully, however, my problem is, there could be multiple files in my source location. I have to pick each file one by one and do certain set of operations on that file, then pick another file and repeat the same operations and then pick the next file and so on.. Please suggest for this. I really apologize if I was not clear in my first attempt to ask the question. – sachin vaidya Jun 25 '18 at 11:42
  • My suggestion remain: use a job that will loop to execute your transformation. Ask for more info if you need. – AlainD Jun 25 '18 at 12:21
  • Thanks mate!! To use a job in the loop... that's what not clicking to me earlier. I got my process working. Thankyou once again – sachin vaidya Jun 28 '18 at 11:43