If you need to process a few seldom files just-in-time, you can use the wait for file stage and schedule it in advance. If it's okay to process the files in bigger intervals, you can just schedule a job to run every interval like once a day, every hour or every minute and then process all files in the folder.
You mentioned that you have to deal with many different file names and extensions. I assume they're also of different structure. Beware of trying to build jobs that can handle all and everything.
Depending on the frequency, type and amount of files you expect to process, you have several methods to achieve best performance: either loop a few files in a sequence file-by-file and do complex stuff on each file, or read many files at once in a parallel job. Looping hundrets of files in a sequence having several jos within the loop could end up in very long coffee breaks.
If the task is to just move the files, maybe a shell script (-> command stage) is your friend.
But if you have tons of files (no matter what name) of the same structure (like csv files) and you need the content in a database, then you can read them all at once in a parallel-job using the sequential file stage
and save them directly into a dataset
. That stage allows you to select the files by pattern (maning that *
is your friend in this case) and it can output the filename to a new field. So you'd end up with a DataSet containing your data and corresponding filenames.
Even if the files do not have the same structure, you can output the whole file content in one lob column and still process all reading in one job.
If you name the dataset dynamically, you can schedule another independend job to process the queue of DataSets in parallel for further processing.