3

I want to compute PageRank from a CSV file of edges formatted as follows:

12,13,1.0
12,14,1.0
12,15,1.0
12,16,1.0
12,17,1.0
...

My code:

var filename = "<filename>.csv"

val graph = Graph.fromCsvReader[Long,Double,Double]( 
                   env = env, 
                   pathEdges = filename, 
                   readVertices = false, 
                   hasEdgeValues = true, 
                   vertexValueInitializer = new MapFunction[Long, Double] { 
                           def map(id: Long): Double = 0.0 } )

val ranks = new PageRank[Long](0.85, 20).run(graph)

I get the following error from the Flink Scala Shell:

error: type mismatch;
 found   : org.apache.flink.graph.scala.Graph[Long,_23,_24] where type _24 >: Double with _22, type _23 >: Double with _21
 required: org.apache.flink.graph.Graph[Long,Double,Double]
            val ranks = new PageRank[Long](0.85, 20).run(graph)
                                                         ^

What am I doing wrong?

( And are the initial values 0.0 for every vertex and 1.0 for every edge correct? )

Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
lary
  • 399
  • 2
  • 14

1 Answers1

2

The problem is that you're giving the Scala org.apache.flink.graph.scala.Graph to PageRank.run which expects the Java org.apache.flink.graph.Graph.

In order to run a GraphAlgorithm for a Scala Graph object, you have to call the run method of the Scala Graph with the GraphAlgorithm.

graph.run(new PageRank[Long](0.85, 20))

Update

In the case of the PageRank algorithm it is important to note that the algorithm expects an instance of type Graph[K, java.lang.Double, java.lang.Double]. Since Java's Double type is different from Scala's Double type (in terms of type checking), this has to be accounted for.

For the example code this means

val graph = Graph.fromCsvReader[Long,java.lang.Double,java.lang.Double]( 
  env = env, 
  pathEdges = filename, 
  readVertices = false, 
  hasEdgeValues = true, 
  vertexValueInitializer = new MapFunction[Long, java.lang.Double] { 
         def map(id: Long): java.lang.Double = 0.0 } )
  .asInstanceOf[Graph[Long, java.lang.Double, java.lang.Double]]
Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • with `graph.run(new PageRank[Long](0.85, 20))` I get the error `error: type mismatch; found : org.apache.flink.graph.library.PageRank[Long] required: org.apache.flink.graph.GraphAlgorithm[Long,_23,_24,?] graph.run(new PageRank[Long](0.85, 20)) ^ ` – lary Nov 16 '15 at 12:06
  • with `graph.run(new PageRank[Long,Double,Double,DataSet[Vertex[Long,Double]]](0.85, 20) )` I get the error `error: wrong number of type arguments for org.apache.flink.graph.library.PageRank, should be 1` – lary Nov 16 '15 at 12:17
  • Is `PageRank` actually implelemted for Scala? There is only a Java Class in the documentation / github, as far as I could find it ... In flink-gelly-scala there is no `PageRank` (?) – lary Nov 16 '15 at 12:21
  • `PageRank` only takes a single type argument. Thus, `new PageRank[Long](0.85, 20)` is the right instantiation. `PageRank` is only implemented using Flink's Java API. But you can use these algorithms from the Scala API via the `run` method. – Till Rohrmann Nov 16 '15 at 12:30
  • So how can I fix the error with `graph.run(new PageRank[Long](0.85, 20))` then? There is one argument for `PageRank` and I am using the `run` Method. What is wrong here? – lary Nov 16 '15 at 12:32
  • The problem is that the value type of the vertices and the edges has to be `java.lang.Double` instead of a `scala.Double`. That's because `PageRank` was implemented using Java. I've updated my answer accordingly. – Till Rohrmann Nov 16 '15 at 12:59
  • Well, unfortunately I get the same error with `run()` with `java.lang.Double` and even with `run()` with `java.lang.Long` and `java.lang.Double` .. – lary Nov 16 '15 at 13:17
  • Sorry, I meant, I get the same error when instantiating the graph with `[Long,java.lang.Double,java.lang.Double]` and also with `[java.lang.Long,java.lang.Double,java.lang.Double]` (and the `MapFunction` accordingly) – lary Nov 16 '15 at 14:18
  • It seems to be a bug in the `Graph.fromCsvReader` method that Scala cannot retrieve the proper type of the `Graph`. As a temporary workaround you have to add the following cast operation `.asInstanceOf[Graph[Long, java.lang.Double, java.lang.Double]]` to the `fromCsvReader`. I've also updated my answer accordingly. I'll try to fix the underlying problem. – Till Rohrmann Nov 16 '15 at 17:50
  • Ok, with `asInstanceOf[...]` it worked. Thank you for your help! – lary Nov 16 '15 at 18:26
  • The casting problem with `asInstanceOf[...]` will be fixed once https://github.com/apache/flink/pull/1370 is merged. – Till Rohrmann Nov 17 '15 at 17:31