0

I am new to using Pentaho Spoon. I have about 100 text files in a folder, none of which have file extensions. I have found that if I create a job and move a file, one at a time, that I can simply rename that file, adding a .txt extension to the end. What I'd like to do is create a job that goes through and renames each file and adds the .txt extension. I've tried using the regex, but can't seem to get it to work because there's no file extension. Any help would be greatly appreciated.

3 Answers3

1

It's a pretty straightforward solution but you need to use a Transformation, as Job steps won't do it, ok?

You need the following steps:

enter image description here

Get File Names: just add your folder and the RegExp ".*" (without the double quotes), so everything is listed. Check if it's ok with "Show filename(s)..." button.

Modified Java Script Value: declare a new_filename var concatenating the desired extension. Remember to click "Get Variables" after adding the script to output the new field.

var new_filename = filename + '.txt';

Process Files: select Operation = Move and filename/new_filename as your source/target filenames.

That's it!

jfneis
  • 2,139
  • 18
  • 31
0

Renaming a group of files is one thing I wouldn't use Kettle for. Why not let the shell do what the shell does best?

rem example for Windows CMD shell
ren absolute-path-to-folder\*. *.txt

This can be done using a Shell job entry, if you find reason to do it in Kettle at all.

marabu
  • 1,166
  • 7
  • 9
0

I've seen "just use a shell script" answers for this before. Works great if you can guarantee you're Kettle server is on the same OS as the developer workstation. I'm in an environment where the Dev/Spoon instance is Windows, but the Prod/Kettle environment is Linux, so you can't write one script file to rule them all.

As for "Why on earth would you do this?", my scenario is an integration scenario. We're using Pentaho for Data Integration, but a different tool for Enterprise Integration. I want a Pentaho Job to produce an output file, and I want my Enterprise Integration tool to pick up the file and do something with it, but not before Pentaho is done writing the file. Renaming helps avoid a race condition when the Enterprise Integration solution recognizes the file is there, but Pentaho isn't done writing it yet.

If I could rename a set of files, for example change from test..csv.processing to test..csv, then Pentaho would create the file initially with the .processing extension, and then remove the extension once it's done. The Enterprise Integration solution that's looking for test.*.csv won't start processing the file until Pentaho renames it. Bingo, no race condition.

Bob Blackard
  • 211
  • 1
  • 8