1

I have created an RDD of two input files i.e. Edges and Node files. While I use the Graph.fromEdge() method to create a graph, I get errors. Could someone please help me? The inputEdgesTextFile and inputNodesTextFile are taking the input text dataset. On the very last line of the code, I am getting error. I am posting an error that I am getting in my code. enter image description here

public static void main(String[] args) {

    SparkConf conf = new SparkConf().setMaster("local").setAppName("GraphFileReadClass");
    JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
    ClassTag<String> stringTag = scala.reflect.ClassTag$.MODULE$.apply(String.class);
    ClassTag<String> intTag = scala.reflect.ClassTag$.MODULE$.apply(Integer.class);

    $eq$colon$eq<String, String> tpEquals = scala.Predef.$eq$colon$eq$.MODULE$.tpEquals();
    // Load an external Text File in Apache spark
    //The text files number of lines and each line consists these structure
    //SFEdge contains: | Edge_id integer | Source_Id integer | Destination_id integer | EdgeLength double |
    //SFNodes contains: | Node_id integer | Longitude double | Latitude double |
    
    
    JavaRDD<String> inputEdgesTextFile = javaSparkContext.textFile("./SFEdges.txt");
    JavaRDD<String> inputNodesTextFile = javaSparkContext.textFile("./SFNodes.txt");
    ArrayList<Tuple2<Integer, Integer>> nodes = new ArrayList<>();
    ArrayList<Edge<Double>> edges = new ArrayList<>();

    JavaRDD<NodesClass> nodesPart = inputNodesTextFile.mapPartitions(p -> {
        ArrayList<NodesClass> nodeList = new ArrayList<NodesClass>();
        int counter = 0;
        while (p.hasNext()) {
            String[] parts = p.next().split(" ");
            NodesClass node = new NodesClass();
            node.setNode_Id(Integer.parseInt(parts[0]));
            node.setLongitude(Double.parseDouble(parts[1]));
            node.setLatitude(Double.parseDouble(parts[2]));
            nodes.add(new Tuple2<Integer, Integer>(counter, Integer.parseInt(parts[0])));
            nodeList.add(node);
            counter++;

        }
        return nodeList.iterator();
    });
    JavaRDD<Tuple2<Integer, Integer>> nodesRDD = javaSparkContext.parallelize(nodes);
    nodesRDD.foreach(data -> System.out.print("Node details: " + data._1() + " " + data._2()));

    JavaRDD<EdgeNetwork> edgesPart = inputEdgesTextFile.mapPartitions(p -> {
        ArrayList<EdgeNetwork> edgeList = new ArrayList<EdgeNetwork>();
        while (p.hasNext()) {

            String[] parts = p.next().split(" ");
            EdgeNetwork edgeNet = new EdgeNetwork();
            edgeNet.setEdge_id(Integer.parseInt(parts[0]));
            edgeNet.setSource_id(Integer.parseInt(parts[1]));
            edgeNet.setDestination_id(Integer.parseInt(parts[2]));
            edgeNet.setEdge_length(Double.parseDouble(parts[3]));
            edges.add(new Edge<Double>(Long.parseLong(parts[1]), Long.parseLong(parts[2]),
                    Double.parseDouble(parts[3])));
            edgeList.add(edgeNet);

        }
        return edgeList.iterator();
    });
    JavaRDD<Edge<Double>> edgesRDD = javaSparkContext.parallelize(edges);

    Graph<String, Double> graph = Graph.fromEdges(edgesRDD.rdd(), " ", StorageLevel.MEMORY_ONLY(),
            StorageLevel.MEMORY_ONLY(), stringTag, stringTag);
    //The warning shows above this line for Graph<String,Double>
    //Maybe the RDD that I have created has some errors. Please suggest me

1 Answers1

1

Graph.fromEdges looks in Scala like

def fromEdges[VD: ClassTag, ED: ClassTag](
    edges: RDD[Edge[ED]],
    defaultValue: VD,
    edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY,
    vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED] = {
...

The two class tags are translated in the Java-API as additional parameters at the end of the method. VD is a string type here and ED is a double type, so the Java call should reflect these types, the second class tag should be Double:

ClassTag<Double> doubleTag = scala.reflect.ClassTag$.MODULE$.apply(Double.class);

Graph.fromEdges(edgesRDD.rdd(), " ", StorageLevel.MEMORY_ONLY(),
        StorageLevel.MEMORY_ONLY(), stringTag, doubleTag);
werner
  • 13,518
  • 6
  • 30
  • 45
  • I replaced my code with the following code, however, I cannot see any printout on the console. `Graph graph = Graph.fromEdges(edgesRDD.rdd(), " ", StorageLevel.MEMORY_ONLY(), StorageLevel.MEMORY_ONLY(), stringTag, doubleTag); graph.vertices().toJavaRDD().collect().forEach(System.out::println);` – Aavash Bhandari Sep 29 '21 at 06:31
  • Good to hear that the compiler problem is solved! The empty output is probably another issue, but that's difficult to say without seeing the data. – werner Sep 29 '21 at 16:40
  • Actually, I am trying to use the road network dataset of SanFranciso that is freely available in this link: https://www.cs.utah.edu/~lifeifei/SpatialDataset.htm And I want to partition the road network map on a cluster of 3 machines (for now). However, the graph that I created from the EdgeRDD is not showing the graph printout. Could you suggest what I can do to run a balanced graph partition for 3 machines? @werner – Aavash Bhandari Oct 07 '21 at 01:38
  • I managed to used GraphX Edge class to create an Edge list and just read the text file and added it to the list. And now I can see that Graph has been created. But still, I can't run "PartitionStrategy.RandomVertexCut$.MOUDLE$". It shows an error "RandomVertexCut$ cannot be resolved or is not a field" `Graph graph = Graph.fromEdges(edgeRDD.rdd(), "", StorageLevel.MEMORY_ONLY(), StorageLevel.MEMORY_ONLY(), stringTag, doubleTag); graph.partitionBy(PartitionStrategy.RandomVertexCut$.MODULE$);` – Aavash Bhandari Oct 07 '21 at 09:03
  • @AavashBhandari I would suggest you create a new question, as this looks like a different problem to me. With a new question you would have a higher chance to get responses: questions are read far more often than comments so more people would see your question. – werner Oct 07 '21 at 17:55
  • I have created a new post. Could you see? https://stackoverflow.com/questions/69490581/apache-graphx-partiton-strategy-is-generating-error – Aavash Bhandari Oct 08 '21 at 04:42