2

I am doing a transformation on Pentaho Data Integration and I have a list of files in a directory of my SFTP server. This files are named with FILE_YYYYMMDDHHIISS.txt format, my directory looks like that:

  • mydirectory
    • FILE_20130701090000.txt
    • FILE_20130701170000.txt
    • FILE_20130702090000.txt
    • FILE_20130702170000.txt
    • FILE_20130703090000.txt
    • FILE_20130703170000.txt

My problem is that I need get the last file of this list in accordance of its creation date, to pass it to other transformation step...

How can I do this in Pentaho Data Integration?

  • Hi, welcome to StackOverflow. I've given your question a close vote because it doesn't fit well with the SO format. In particular, you haven't shown us what you've tried. If you don't even know what to try, you probably need more basic help that SO is designed to provide with its question and answer format. See http://stackoverflow.com/help/asking – Gordon Seidoh Worley Jul 17 '13 at 18:52
  • 1
    i am starting to get tired of this closing proposals on kettle answers. I am an kettle user and i think this answer fits perfect to be answered. i understand the problem fair enough. – jacktrade Jul 18 '13 at 17:33

1 Answers1

3

In fact this is quite simple because your file names can be sorted textually, and the max in the sort list will be your most recent file.

Since a list of files is likely short, you can use a Memory Group by step. A grouping step needs a separate column by which to aggregate. If you only have column and you want to find the max in the entire set, you can add a grouping column with an Add Constants step, and configure it to add a column with, say an integer 1 in every row.

Configure your Memory Group by to group on the column of 1s, and use the file name column as the subject. Then simply select the Maximum grouping type. This will produce a single row with your grouping column, the file name field removed and the aggregate column containing your max file name. It would look something like this:

enter image description here

Brian.D.Myers
  • 2,448
  • 2
  • 20
  • 17