0

I am using Scala on Spark 2.1.0 GraphX. I have an array as shown below:

scala> TEMP1Vertex.take(5)
res46: Array[org.apache.spark.graphx.VertexId] = Array(-1895512637, -1745667420, -1448961741, -1352361520, -1286348803)

If I had to filter the edge table for a single value, let's say for soruce ID -1895512637

val TEMP1Edge = graph.edges.filter { case Edge(src, dst, prop) => src == -1895512637}

scala> TEMP1Edge.take(5)
res52: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(-1895512637,-2105158920,89), Edge(-1895512637,-2020727043,3), Edge(-1895512637,-1963423298,449), Edge(-1895512637,-1855207100,214), Edge(-1895512637,-1852287689,339))

scala> TEMP1Edge.count
17/04/03 10:20:31 WARN Executor: 1 block locks were not released by TID = 1436:[rdd_36_2]
res53: Long = 126

But when I pass an array which contains a set of unique source IDs, the code runs successfully but it doesn't return any values as shown below:

scala> val TEMP1Edge = graph.edges.filter { case Edge(src, dst, prop) => src == TEMP1Vertex}
TEMP1Edge: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[Int]] = MapPartitionsRDD[929] at filter at <console>:56

scala> TEMP1Edge.take(5)
17/04/03 10:29:07 WARN Executor: 1 block locks were not released by TID = 1471:
[rdd_36_5]
res60: Array[org.apache.spark.graphx.Edge[Int]] = Array()

scala> TEMP1Edge.count
17/04/03 10:29:10 WARN Executor: 1 block locks were not released by TID = 1477:
[rdd_36_5]
res61: Long = 0
SoakingHummer
  • 562
  • 1
  • 7
  • 25
  • I don't know anything about graphX, but your predicate probably always returns `false`, since the type of `src` and `TEMP1Vertex` are different. You should probably do something like `Temp1Vertex.contains(src)` (although I don't know if such a method exists) – Cyrille Corpet Apr 03 '17 at 05:57
  • I tried `src == Traversable(TEMP1Vertex)` and `src == Iterable(TEMP1Vertex)` and neither worked, although execution was successful. – SoakingHummer Apr 03 '17 at 07:00
  • `==` is not strongly typed, mainly for interoperability with java, so it will always compile. However, if you compare objects of different types, it will always return false (unless there is a specific `equals` method defined) – Cyrille Corpet Apr 03 '17 at 07:13

1 Answers1

2

I suppose that TEMP1Vertex is of type Array[VertexId], so I think that your code should be like:

val TEMP1Edge = graph.edges.filter { 
  case Edge(src, _, _) => TEMP1Vertex.contains(src) 
}
Federico Pellegatta
  • 3,977
  • 1
  • 17
  • 29
  • I get the following error `:57: error: value contains is not a member of org.apache.spark.rdd.RDD[org.apache.spark.graphx.VertexId]` – SoakingHummer Apr 03 '17 at 07:20
  • 1
    Is it required for `TEMP1Vertex` to be of type `RDD[VertexId]`? Is `Array[VertexId]` fine too? – Federico Pellegatta Apr 03 '17 at 07:31
  • `TEMP1Vertex` follows the syntax as shown in the question. It can be modified to any type as long as the correct output is returned, but I don't know how it can be converted from `RDD[VertexId]` to `Array[VertexId]` – SoakingHummer Apr 03 '17 at 07:35
  • I changed `val TEMP1Vertex = TEMPEdge.map(_.srcId)` to `val TEMP1Vertex = TEMPEdge.map(_.srcId).collect()` and turned it into `Array[Long] = Array(-1895512637, -1745667420, -1448961741, -1352361520, -1286348803)` and It worked! Thanks! – SoakingHummer Apr 03 '17 at 08:56