In Hadoop how to handle daily increasing data:
For example:
1st day I may have 1 million files in some input folder (e.g. hadoop/demo)
2nd day in the same folder, files may increase from existing 1 million files + another new 1 million files so totally 2 million.
likewise 3rd 4th days... keep goes.
My constraint is -> 1st day's files should not be processed on the next day.
(i.e) Already proceeded files should not processed again when new files are added with them. More specifically, only the new added files should be processed and older files should be neglected.
So help me in the way that I can solve this issue.
Still if you didn't understand the constraint, kindly say where it's unclear so that I can elaborate more about my constraint!