I have a requirement to split my input file into 2 output file based on a filter condition. My output directory should looks like below:
/hdfs/base/dir/matched/YYYY/MM/DD
/hdfs/base/dir/notmatched/YYYY/MM/DD
I am using MultipleOutputs
class to split my data in my map function.
In my driver class I am using like below:
FileOutputFormat.setOutputPath(job, new Path("/hdfs/base/dir"));
and in Mapper I am using below:
mos.write(key, value, fileName); // File Name is generating based on filter criteria
This program is working fine for a single day. But in second day my program is failing saying that:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://nameservice1/hdfs/base/dir already exists
I cannot use different base directory for the second day.
How can I handle this situation?
Note: I don't want to read the input twise to create 2 separate file.