Issue -
I am running a series of mapreduce jobs wrapped in an oozie workflow. Input data consists of a bunch text files most of which are fairly small (KBs) but every now and then I get files over 1-2 MB which cause my jobs to fail. I am seeing two reasons why jobs are failing - one, inside one or two mr jobs the file is parsed into a graph in memory and for a bigger file its mr is running out of memory and two, the jobs are timing out.
Questions -
1) I believe I can just disable the timeout by setting mapreduce.task.timeout to 0. But I am not able to find any documentation that mention any risk in doing this.
2) For the OOM error, what are the various configs I can mess around with ? Any links here on potential solutions and risks would be very helpful.
3)I see a lot of "container preempted by scheduler" messages before I finally get OOM.. is this a seperate issue or related? How do I get around this?
Thanks in advance.