I am running a spark job with yarn, and my code written in java, now I want to execute a function to make some resource collect in every worker when the worker's job finished.
I tried mapPartitions()
function, but there are many partitions run in the same worker, so the function will be executed several times.
Could I implement this and how ?
code updated:
JavaRDD<String> sourceRDD = context.textFile(inputPath);
sourceRDD.map(doSomething()); // every worker has it's env, I want to execute a function in every worker when map() ends.
doResourceCollect(); // It runs in the final worker, so I can't get worker's env.