Spark for non-distributed parallel computing and its performance

Asked Apr 27 '20 at 03:18

Active Apr 27 '20 at 03:18

Viewed 133 times

I am new to Spark. I am wondering how well it performs when scaled down to a single node, and how much the overhead is compared to regular non-distributed parallel approaches, so I can evaluate whether it's a good choice to write a non-distributed parallel computing program in Spark, and make it scale to multiple nodes when needed.

So can Spark be used efficiently for local single-machine parallel computing? If yes, how is its performance compared to that of regular Scala parallel collections or Java 8 parallel streams? Is the overhead significant?

Additionally and specifically for graphs, how is the performance of GraphX compared to that of Graph for Scala or JGraphT?

asked Apr 27 '20 at 03:18

Shreck Ye

1,591
2
16
32

2

Refer bench marking https://databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html – QuickSilver Apr 27 '20 at 04:07
@QuickSilver Is there a benchmark compared to approaches in Scala or Java? Python is known to have more performance overhead than JVM, and that may account for part of the performance difference. – Shreck Ye Apr 27 '20 at 12:22

Spark for non-distributed parallel computing and its performance

0 Answers0