Currently I am running my spark cluster as standalone mode. I am reading data from flat files or Cassandra
(depending upon the job) and writing back the processed data to the Cassandra
itself.
I was wondering if I switch to Hadoop
and start using a Resource manager
like YARN
or mesos
, does it give me an additional performance advantage like execution time and better resource management?
Currently sometime when I am processing huge chunk of data during shuffling with a possibility of stage failure. If I migrate to a YARN
, can Resource manager
address this issue?