I have Spark set up in standalone mode on a single node with 2 cores and 16GB of RAM to make some rough POCs.
I want to load data from a SQL source using val df = spark.read.format('jdbc')...option('numPartitions',n).load()
. When I tried to measure the time taken to read a table for different numPartitions
values by calling a df.rdd.count
, I saw the the time was the same regardless of the value I gave. I also noticed one the context web UI that the number of Active executors was 1, even though I gave SPARK_WORKER_INSTANCES=2
and SPARK_WORKER_CORES=1
in my spark_env.sh file.
I have 2 questions:
Do the numPartitions
actually created depend on the number of executors?
How do I start spark-shell with multiple executors in my current setup?
Thanks!