So, I have a small cluster with 3 Spark workers(2 executors each) and on the same nodes I have also installed Cassandra in order to achieve data locality. In order to evaluate the speed and times(from SparkUI) I run the same code with, firstly one spark-cassandra node, then two and then three spark-cassandra nodes for 3 times in every case. The results are below, but I do not understand why does it take more time with 3 nodes than 2?
I am not sure what to check. For the above times spark.sql.shuffle.partitions was 96, but I tried also the "3 / 3" with 18 partitions and it was still the same (3min 13s, 3min 5s, 3min 19s)
What could be happening and why? Please, let me know if you need more information.
Edit1
The only difference between the first 2 cases and the 3rd is the replication factor in Cassandra db. For the first 2 is 1 and for the 3rd case is 3. Could that be the reason?network traffic and latencies?
Edit2
Below are some pictures from the Stages Tab of SparkUI with 3 spark-cassandra nodes (3rd case).