running job with different file without reloading the file

Question

I created a job that could be reusable for new files. The entire activities in the job, the maps and everything else will remain the same except for the file name. I already tried it once but it seems that i need to re "load" the file and remap everything again. It's inefficient. Is there any way for me to pass different file in a job without remaping, reconfiguring and reloading anything?

i'm not sure @MichaelTiefenbacher, but i have found out that i don't need to reload and remap it since what i passed will have same map and everything — random student, Sep 03 '20 at 04:15

score 1 · Accepted Answer · answered Sep 02 '20 at 15:08

You have multiple options for allowing a DataStage parallel job to use a different filename for input on each job run:

When using either Sequential File stage or File Connector stage, in stead of typing the actual filename, you can input the name of a job parameter which has been defined on the Parameters tab of the job properties dialog. For example, if you define string parameter myFile, then in the filename field of input stage you would enter #myFile# and at job run-time that would be replaced by whatever is the current value of the myFile parameter. If you run job manually from Director/Designer clients, you will have job run dialog where you can specify a value for job parameters. If you start job via dsjob command, there are options to pass in job parameters on command line. You also have option to use parameterset files that you can modify prior to job run.
Another option would be to use a file location and pattern instead of a specific file name. Both Sequential File stage and File Connector stage let you specify a pattern, for example: /data/my_input_files/*.txt Then, each time you run job it will input any files at that location matching the above pattern, so it can process multiple files. However, to prevent re-processing files from prior job runs, you will want to clean up any files at that location after job completes. Then when you have new files to process just put them in that directory and re-run the job.

what if i use unstructured file? am i still able to do the solution? and for point 2, since in my folder there will be 16 files with file format (.xlsx) , and i made 16 jobs for each of them, how do i specify which file for which job? Thanks a lot Brian — random student, Sep 03 '20 at 04:20

score 1 · Answer 2 · answered Sep 03 '20 at 05:56

1

In case if all the files contains a similar data structure, you need to implement one parallel job and if you have a similar pattern of file name for all file names Such as 1234ab.xls, 1234vd.xls, 1234gd.xls, ... you could pass the file name as 1234??.xls In the sequential job file name parameter (Use this as file name in parallel job) which contains the above parallel job to be executed.

answered Sep 03 '20 at 05:56

VSK

108
9

can i use it for unstructured data? – random student Sep 03 '20 at 06:12
1

I think it should work when you read the file name from a parameter which is defined as 'path/1234??.XLS' in the unstructured data, give it a try!! BTW [this](http://www.dsxchange.com/viewtopic.php?p=444239) helps you – VSK Sep 03 '20 at 08:21
what if the first pattern is fixed but the rest is varied, example sales_february (this will be sales_????????) , sales_may (this will be sales_???) from what i get for your explanation, the number of `?` needs to fit the number of the fitted filename. how do i determine the number of the `?` if the number of it will vary too? – random student Sep 04 '20 at 08:47
1

In that case use could use ```sales_*.XLS``` since the month names are not of a fixed size – VSK Sep 04 '20 at 08:53

running job with different file without reloading the file

2 Answers2