5

Thanks in advance for your input. I am a newbie to ML. I've developed a R model (using R studio on my local) and want to deploy on the hadoop cluster having R Studio installed. I want to use SparkR to leverage high performance computing. I just want to understand the role of SparkR here.

Will SparkR enable the R model to run the algorithm within Spark ML on the Hadoop Cluster?

OR

Will SparkR enable only the data processing and still the ML algorithm will run within the context of R on the Hadoop Cluster?

Appreciate your input.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Suri
  • 229
  • 2
  • 15

1 Answers1

0

These are general questions, but they actually have a very simple & straightforward answer: no (to both); SparkR wiil do neither.

From the Overview section of the SparkR docs:

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR cannot even read native R models.

The idea behind using SparkR for ML tasks is that you develop your model specifically in SparkR (and if you try, you'll also discover that it is much more limited in comparison to the plethora of models available in R through the various packages).

Even conveniences like, say, confusionMatrix from the caret package, are not available, since they operate on R dataframes and not on Spark ones (see this question & answer).

desertnaut
  • 57,590
  • 26
  • 140
  • 166