I am trying to run a script using spark submit as this
spark-submit -v \
--master yarn \
--num-executors 80 \
--driver-memory 10g \
--executor-memory 10g \
--executor-cores 5 \
--class cosineSimillarity jobs-1.0.jar
This script is implementing DIMSUM algorithm on 60K records.
Unfortunately this continues even after 3 hours. I tired with 1K data and runs successfully within 2min.
Can anyone recommend any changes to spark-submit params to make it faster?