I was wondering how Spark deal with Java program calling some machine learing algorithm provided by MLlib. Do I need to download Spark Project ML Library? What's more, where is the source code of MLlib for Java API ? I can't find it in it's repository
Asked
Active
Viewed 684 times
1 Answers
0
Spark MLlib is written in Scala, you could directly use it. In repository it's here.
You could also check some examples here

Yuan JI
- 2,927
- 2
- 20
- 29
-
So, what MLlib do for a Java or Python API? Does it implement those machine learning algorithms in Java or Python? Or it only call corresponding scala implementation for a Java or Python API? – Hereme Jun 15 '16 at 07:59
-
For Java program, it calls directly scala implementation. For Python Spark have Pyspark which implements also mllib, please check it out [here](https://github.com/apache/spark/tree/master/python/pyspark/mllib) – Yuan JI Jun 15 '16 at 08:02
-
But what for algorithms in module ml (not MLlib)? Can't find detail of implementation. – Hereme Jun 15 '16 at 08:09
-
The same as ml, Java API uses Scala implementation, see [here](https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/ml), and pyspark has its own implementation, see [here](https://github.com/apache/spark/tree/master/python/pyspark/ml) – Yuan JI Jun 15 '16 at 08:12
-
Yes, I've found it. But not like algorithms in module mllib, algorithms in module ml extends some classes like JavaEstimator and there is not any code I can find used to train it. Can you tell me the mechanism of module ml? Thank you. – Hereme Jun 15 '16 at 08:29
-
ml package is a `DataFrame` based machine learning package, while mllib package is `RDD` based. ml helps users create and tune practical machine learning pipelines. so what's important in ml is the pipeline mechanism. For details, you could check the [ml documentation](http://spark.apache.org/docs/latest/ml-guide.html), it explains how pipeline works in ml, you could learn a lot from it. – Yuan JI Jun 15 '16 at 08:39