3

Basically I am a java developer & now I got a chance to work on Spark & I gone through basics of the Spark api like what is SparkConfig, SparkContaxt, RDD, SQLContaxt, DataFrame, DataSet & then I able to perform some simple simple transformations using RDD, SQL.... but when I try to workout some sample graphframe application using java then I can'able to succeed & I gone through so many youtube tutorials, forums & stackoverflow threads but no where I haven't find any direct suggestion or solution.Actually I facing this issue when I try to create a object to GraphFrame class & I have downloaded receptive jar(graphframes-0.2.0-spark2.0-s_2.11.jar) too but still facing issue now I want put my analysis till where I reach due to very new to Spark I can't able to move further so if someone help me it's really helpful to all. Thanks in advance. The exception is I am facing The constructor GraphFrame(DataFrame, DataFrame) is undefined

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import org.apache.spark.storage.StorageLevel;
import org.graphframes.GraphFrame;

import com.fasterxml.jackson.core.JsonParseException;
import com.fasterxml.jackson.databind.JsonMappingException;

public class SparkJavaGraphFrameOne {

    public static void main(String[] args) throws JsonParseException, JsonMappingException, IOException{

        SparkConf conf = new SparkConf().setAppName("test").setMaster("local");

        JavaSparkContext sc = new JavaSparkContext(conf);
        SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);

        JavaRDD<Row> verRow = sc.parallelize(Arrays.asList(RowFactory.create(1,"A"),RowFactory.create(2,"B")));
        JavaRDD<Row> edgRow = sc.parallelize(Arrays.asList(RowFactory.create(1,2,"Edge")));     

        List<StructField> verFields = new ArrayList<StructField>();
        verFields.add(DataTypes.createStructField("id",DataTypes.IntegerType, true));
        verFields.add(DataTypes.createStructField("name",DataTypes.StringType, true));

        List<StructField> EdgFields = new ArrayList<StructField>();
        EdgFields.add(DataTypes.createStructField("fromId",DataTypes.IntegerType, true));
        EdgFields.add(DataTypes.createStructField("toId",DataTypes.IntegerType, true));
        EdgFields.add(DataTypes.createStructField("name",DataTypes.StringType, true));

        StructType verSchema = DataTypes.createStructType(verFields);
        StructType edgSchema = DataTypes.createStructType(EdgFields);

        DataFrame verDF = sqlContext.createDataFrame(verRow, verSchema);
        DataFrame edgDF = sqlContext.createDataFrame(edgRow, edgSchema);

        GraphFrame g = new GraphFrame(verDF,edgDF);
        g.vertices().show();
        g.edges().show();
        g.persist(StorageLevel.MEMORY_AND_DISK());
    }

}
Venkaiah Yepuri
  • 1,561
  • 3
  • 18
  • 29

4 Answers4

8

I have written sample program in java using Spark 2.0.0 and GraphFrame 0.2.0. This program is based on the sample program given at http://graphframes.github.io/quick-start.html#start-using-graphframes. Hope this helps.

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.abaghel.examples.spark</groupId>
<artifactId>spark-graphframe</artifactId>
<version>1.0.0-SNAPSHOT</version>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-graphx_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>graphframes</groupId>
        <artifactId>graphframes</artifactId>
        <version>0.2.0-spark2.0-s_2.11</version>
    </dependency>
</dependencies>

<repositories>
    <!-- list of other repositories -->
    <repository>
        <id>SparkPackagesRepo</id>
        <url>http://dl.bintray.com/spark-packages/maven</url>
    </repository>
 </repositories>
 <build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
  </build>
</project>

SparkGraphFrameSample.java

package com.abaghel.examples.spark.graphframe;

import java.util.ArrayList;
import java.util.List;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.graphframes.GraphFrame;
import org.graphframes.lib.PageRank;
/**
 * Sample application shows how to create a GraphFrame, query it, and run the PageRank algorithm.
 * 
 * @author abaghel
 *
 */
public class SparkGraphFrameSample {

 public static void main(String[] args) {
    SparkSession spark = SparkSession.builder()
            .appName("SparkGraphFrameSample")
            .config("spark.sql.warehouse.dir", "/file:C:/temp")
            .master("local[2]")
            .getOrCreate();

    //Create a Vertex DataFrame with unique ID column "id"
    List<User> uList = new ArrayList<User>() {
        {
            add(new User("a", "Alice", 34));
            add(new User("b", "Bob", 36));
            add(new User("c", "Charlie", 30));
        }
    };

    Dataset<Row> verDF = spark.createDataFrame(uList, User.class);

    //Create an Edge DataFrame with "src" and "dst" columns
    List<Relation> rList = new ArrayList<Relation>() {
        {
            add(new Relation("a", "b", "friend"));
            add(new Relation("b", "c", "follow"));
            add(new Relation("c", "b", "follow"));
        }
    };

    Dataset<Row> edgDF = spark.createDataFrame(rList, Relation.class);

    //Create a GraphFrame
    GraphFrame gFrame = new GraphFrame(verDF, edgDF);
    //Get in-degree of each vertex.
    gFrame.inDegrees().show();
    //Count the number of "follow" connections in the graph.
    long count = gFrame.edges().filter("relationship = 'follow'").count();
    //Run PageRank algorithm, and show results.
    PageRank pRank = gFrame.pageRank().resetProbability(0.01).maxIter(5);
    pRank.run().vertices().select("id", "pagerank").show();

    //stop
    spark.stop();
  }

}

User.java

package com.abaghel.examples.spark.graphframe;
/**
 * User class
 * 
 * @author abaghel
 *
 */
public class User {
private String id;
private String name;
private int age;

public User(){      
}

public User(String id, String name, int age) {
    super();
    this.id = id;
    this.name = name;
    this.age = age;
}

public String getId() {
    return id;
}
public void setId(String id) {
    this.id = id;
}
public String getName() {
    return name;
}
public void setName(String name) {
    this.name = name;
}
public int getAge() {
    return age;
}
public void setAge(int age) {
    this.age = age;
 }
}

Relation.java

package com.abaghel.examples.spark.graphframe;
/**
 * Relation class
 * 
 * @author abaghel
 *
 */
public class Relation {

private String src;
private String dst;
private String relationship;

public Relation(){

}

public Relation(String src, String dst, String relationship) {
    super();
    this.src = src;
    this.dst = dst;
    this.relationship = relationship;
}

public String getSrc() {
    return src;
}

public void setSrc(String src) {
    this.src = src;
}

public String getDst() {
    return dst;
}

public void setDst(String dst) {
    this.dst = dst;
}

public String getRelationship() {
    return relationship;
}

public void setRelationship(String relationship) {
    this.relationship = relationship;
  }

}

Console output

16/08/27 22:34:45 INFO DAGScheduler: Job 10 finished: show at    SparkGraphFrameSample.java:56, took 0.938910 s
16/08/27 22:34:45 INFO CodeGenerator: Code generated in 6.599005 ms
+---+-------------------+
| id|           pagerank|
+---+-------------------+
|  a|               0.01|
|  b|0.08763274109799998|
|  c|     0.077926810699|
+---+-------------------+
Kedar Mhaswade
  • 4,535
  • 2
  • 25
  • 34
abaghel
  • 14,783
  • 2
  • 50
  • 66
  • Thanks a lot abaghel. Will try this & let you know by end of today. Once again thank you so much I hope its really will help me to move further. – Venkaiah Yepuri Aug 26 '16 at 13:10
  • Your original issue with constructor new GraphFrame(verDF,edgDF) will be solved by using files I have provided in this post. I am expecting an up vote with answer accepted if this java version sample really helped you. – abaghel Aug 27 '16 at 07:06
  • abaghel - Definitely will do the up vote but I am facing some issue while downloading this dependency( jar 0.2.0-spark2.0-s_2.11) can't get download . Any java version issue I am using 1.7. ? – Venkaiah Yepuri Aug 27 '16 at 15:21
  • I am using Java 1.8. I didn't get any issue downloading the jar file as dependency for jar file and repository url is defined in the pom.xml. I am using Eclipse Mars and I built using maven install command which downloaded the jar files. – abaghel Aug 27 '16 at 16:16
  • Any how I have downloaded that jar from out & have given build path. Now while running application I am getting this runtime exception I have done google they suggested like some changes are required in SBT file but I am not sure where this files resides. The exception is Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging – Venkaiah Yepuri Aug 27 '16 at 16:41
  • abagel - can pease post the out put screen shot of the application. Is it show any graph or just display relation among Users objects in the form of text out put in console tab ? – Venkaiah Yepuri Aug 27 '16 at 16:48
  • Are you using Scala and SBT build?. I suggest you take the sample code and pom.xml file I have posted here and run SparkGraphFrameSample. java in eclipse. You can then add you java class there in project and try to run it. I have added console output above. – abaghel Aug 27 '16 at 16:54
  • I did same but while running main class this exception is getting. Please have a look into this text file for full stack exception https://tuarmor.com/t/518bf9e1e7e498b2bcce0dce7c40f98c47fb8d4c & this is my complete code https://github.com/venkatrohith/sparkgraphframes – Venkaiah Yepuri Aug 27 '16 at 17:02
  • Venki, I cloned your project from github and built it using maven command line. Then I ran it using command mvn exec:java -Dexec.mainClass="com.spark.mygraphframe.SparkGraphFrameSample". It is working fine without any issue and I am getting the result. I am using java 1.8. Try with java 1.8. Try cleaning up your environment, try from other machine and maven command line. – abaghel Aug 27 '16 at 17:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/122011/discussion-between-venki-and-abaghel). – Venkaiah Yepuri Aug 27 '16 at 18:00
0

I don't know whether you are able to solve your problem or not. I have just seen your problem. I think for getting Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging, you need to put the following jar in your pom.xml

         <dependency>
            <groupId>com.typesafe.scala-logging</groupId>
             <artifactId>scala-logging-slf4j_2.10</artifactId>
            <version>2.1.2</version>
        </dependency>

I have encountered the same issue and by adding this jar , I was able to resolve that issue.

Rhea
  • 381
  • 1
  • 7
  • 22
  • I Treena, Thanks for yours suggestion. But actually this issue with Java version after updating java version with 8 then issue get resolved.I hope you recognised my who I am ? – Venkaiah Yepuri Sep 17 '16 at 04:09
0

I am able to replicate the issue(running continuously) in 0.5.0-spark2.1-s_2.11 and working fine in 0.4.0-spark2.1-s_2.11

0

To fix GraphFrame constructor issue try:

GraphFrame gf = GraphFrame.apply(verDF, edgeDF);

ilya1245
  • 69
  • 1
  • 2