1

I want to implement some machine learning algorithms using the Spark MLlib library for my Java project. I have tried several tutorial without success.

I am used to using eclipse and was surprised that it was so difficult to set up. My assumption was that I just needed download the library from here and add the jar to my build path but apparently it seems to be more difficult.

abaghel
  • 14,783
  • 2
  • 50
  • 66
A.Dumas
  • 2,619
  • 3
  • 28
  • 51

1 Answers1

2

Create a maven project and add following dependencies (For Spark latest 2.0.0). You can start with running simple program like JavaALSExample.java in eclipse.

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java

There are more java samples available at spark github repository which you can refer. Hope this helps.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
</dependency>
abaghel
  • 14,783
  • 2
  • 50
  • 66
  • Thank you for your answer. Could you tell me were I have to unzip the spark library and is it enough to add the lines above to the pom.xml – A.Dumas Aug 13 '16 at 07:12
  • When you will add above dependencies, all the required jar files for running spark mllib examples will be downloaded in your local maven repository. Only if you are not using the maven project then you will have to manually download and include required jar files in eclipse build path. Using maven is recommended. – abaghel Aug 13 '16 at 07:58
  • thank you so much. I used your advice and created a maven project. I think i almost have it. I just get this *A master URL must be set in your configuration * exception and I don not know were to set it. – A.Dumas Aug 13 '16 at 08:44
  • Use this. SparkSession spark = SparkSession .builder() .appName("JavaALSExample").config("spark.sql.warehouse.dir", "/file:C:/temp") .master("local[2]") .getOrCreate(); – abaghel Aug 13 '16 at 09:13
  • great now it compiles without an exception. I guess I can start testing now. Thank you so much. – A.Dumas Aug 13 '16 at 09:21
  • Old question but validated this works with version 2.4.0 and java8` org.apache.spark spark-core_2.11 2.4.0 org.apache.spark spark-sql_2.11 2.4.0 org.apache.spark spark-mllib_2.11 2.4.0 compile ` using Java 8 and Eclipse 2019-03 (4.11.0) – Gabriel Hernandez Apr 23 '19 at 23:26
  • I can confirm that this method also downloads com.github.fommil.netlib core-1.1.2.jar which is used to perform low level linear algebra routines with classes like ARPACK, BLAS, F2jARPACK, F2jBLAS, F2jLAPACK, LAPACK in your project with their dependencies (arpack_combined_all-0.1.jar, spark-mllib-local_2.11-2.4.0.jar) – Gabriel Hernandez Apr 23 '19 at 23:34