I read an RDBMS tables on PostgreSQL DB as:
val dataDF = spark.read.format("jdbc").option("url", connectionUrl)
.option("dbtable", s"(${execQuery}) as year2017")
.option("user", devUserName)
.option("password", devPassword)
.option("numPartitions",10)
.load()
The option: numPartitions denote number of partitions your data is split into and then process each partition in parallel manner, in this case it is: 10. I thought this is a cool option in spark until I came across the awesome feature of spark-submit: --num-executors, --executor-cores, --executor-memory. I read the concept of the three aforementioned parameters in spark-submit from this link: here
What I don't understand is, if both are used for parallel processing, how different are both to each other ?
Could anyone let me know the difference between the above mentioned options ?