1

I am using Kmeans() in an environment I have no control and I will abandon in <1 month. Spark 1.6.2. is installed.

Should I pay the price for urging 'them' to upgrade to Spark 2.0.0 before I leave? In other words, does Spark 2.0.0 introduce any significant improvements when it comes to Spark Mllib KMeans()?

In my case, quality is a more important factor than speed.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • 1
    _Highly opinionated and untested answer_: Highly unlikely. There is nothing in 2.0 that could significantly improved KMeans performance and implementation didn't change. – zero323 Aug 25 '16 at 20:06

1 Answers1

2

It is rather unlikely.

Spark 2.0.0 doesn't introduce any significant improvements to the core RDD API and KMeans implementation didn't change much since 1.6 with relatively significant changes introduced only by SPARK-15322, SPARK-16696 and SPARK-16694.

If you use ML API there can be also some improvements related to SPARK-14850 but overall I don't see any game changers here.

zero323
  • 322,348
  • 103
  • 959
  • 935