0

I am new to Spark. I am wondering how well it performs when scaled down to a single node, and how much the overhead is compared to regular non-distributed parallel approaches, so I can evaluate whether it's a good choice to write a non-distributed parallel computing program in Spark, and make it scale to multiple nodes when needed.

So can Spark be used efficiently for local single-machine parallel computing? If yes, how is its performance compared to that of regular Scala parallel collections or Java 8 parallel streams? Is the overhead significant?

Additionally and specifically for graphs, how is the performance of GraphX compared to that of Graph for Scala or JGraphT?

Shreck Ye
  • 1,591
  • 2
  • 16
  • 32
  • 2
    Refer bench marking https://databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html – QuickSilver Apr 27 '20 at 04:07
  • @QuickSilver Is there a benchmark compared to approaches in Scala or Java? Python is known to have more performance overhead than JVM, and that may account for part of the performance difference. – Shreck Ye Apr 27 '20 at 12:22

0 Answers0