Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
0
votes
1 answer

Iterating through files in scala to create values based on the file names

I think there may be a simple solution to this, I was wondering if anybody knew how to iterate over a set of files and output a value based on the files name. My problem is, I want to read in a set of graph edges for each month, and then create a…
ALs
  • 509
  • 2
  • 4
  • 17
0
votes
1 answer

Dropping unconnected components of a subgraph GraphX

I have the following graph: // Vertices val usersTest: RDD[(VertexId, (String))] = sc.parallelize(Array((1L, ("AAA")), (2L, ("BBB")), (3L, ("CCC")))) // Edges val relationshipsTest: RDD[Edge[Int]] = sc.parallelize(Array(Edge(1L, 3L, 1),Edge(1L, 3L,…
ulrich
  • 3,547
  • 5
  • 35
  • 49
0
votes
1 answer

how to compute vertex similarity to neighbors in graphx

Suppose to have a simple graph like: val users = sc.parallelize(Array( (1L, Seq("M", 2014, 40376, null, "N", 1, "Rajastan")), (2L, Seq("M", 2009, 20231, null, "N", 1, "Rajastan")), (3L, Seq("F",…
user299791
  • 2,021
  • 3
  • 31
  • 57
0
votes
1 answer

How to create pair RDD with elements that share keys in source RDD?

I have a key-value RDD in pyspark and would like to return an RDD of pairs that have the same key in the source RDD. #input rdd of id and user rdd1 = sc.parallelize([(1, "user1"), (1, "user2"), (2, "user1"), (2, "user3"), (3,"user2"), (3,"user4"),…
Jared
  • 2,904
  • 6
  • 33
  • 37
0
votes
1 answer

How to export Spark GraphX graph to Gephi using scala

I have graph in Spark collected from different data sources. Is there simple way to export Spark GraphX graph to Gephi for visualization using scala? Any common data formats?
szu
  • 932
  • 1
  • 9
  • 22
0
votes
1 answer

how to attach properties to vertices in a graphx and retrieve the neighbourhood

I am rather new with Spark and Scala... I have a graph:Graph[Int, String] and I'd like to attach to these vertices some properties I have in a DataFrame. What I need to do is, for each vertex, to find the average value in the neighbourhood for each…
user299791
  • 2,021
  • 3
  • 31
  • 57
0
votes
1 answer

Apache Zeppelin not showing Spark output

I am testing Zeppelin with Spark using the following data sample: import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val vertexArray = Array( (1L, ("Alice", 28)), (2L, ("Bob", 27)), (3L, ("Charlie", 65)), (4L, ("David", 42)), (5L,…
ulrich
  • 3,547
  • 5
  • 35
  • 49
0
votes
0 answers

why graph.degree returns less values than number of nodes

I have a graph in Spark using Graphx like: val graph = Graph(users, rels) println("number of vertices is " + graph.numVertices + " and number of edges is " + graph.numEdges) and the result is: number of vertices is 253 and number of edges is 228…
user299791
  • 2,021
  • 3
  • 31
  • 57
0
votes
1 answer

How to create links between vertices in RDD[(Long, Vertex)] based on a property?

I have a users: RDD[(Long, Vertex)] collection of users. I want to create links between my Vertex objects. The rule is: if two Vertex have the same value in a selected property - call it prop1, then a link exists. My problem is how to check for…
user299791
  • 2,021
  • 3
  • 31
  • 57
0
votes
1 answer

Apache Spark create vertices from String

Given a string val s = "My-Spark-App" How can vertices be created in the following way with Spark? "My-", "y-S", "-Sp", "Spa", "par", "ark", "rk-", "k-A", "-Ap", "App" Can that problem be parallelized?
Al Jenssen
  • 655
  • 3
  • 9
  • 25
0
votes
1 answer

Filtering collection containing distinct case classes

I have successfully created the following graph: trait VertexProperty case class ShopperProperty(memberID: String) extends VertexProperty case class BasketProperty(basketID: String, epochDate: Long) extends VertexProperty val vertices:…
Christopher Mills
  • 711
  • 10
  • 28
0
votes
0 answers

create VertexId from VertexProperty

I am new to Spark , what I want to do is to create a Graph connecting a seller and a device, the device is a String , when I am creating a Edge ,I have to give a VertexId, how to generate VertexId from VertexProperty in spark
tintin
  • 1,459
  • 1
  • 10
  • 27
0
votes
1 answer

Scala and GraphX in Spark

Any idea why we get these errors? ubuntu@group-3-vm1:~/software/sbt/bin$ ./sbt package [info] Set current project to hello (in build file:/home/ubuntu/software/sbt/bin/) [info] Compiling 1 Scala source to…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
0
votes
2 answers

Does GraphX support subgraph queries?

I have loaded a big graph and a small graph (which is to be my query) using the GraphX API and what I want to do it to check whether the big graph contains the query graph.I searched on the web about subgraph/graph queries with GraphX and I can't…
Iva
  • 357
  • 5
  • 13
0
votes
0 answers

dropped from memory error with graphX query

Using graphX API of apache-spark, I wrote the following code which generates the Graph successfully. But, when I try to query this graph, some memory related error occurs val RDDorg = sc.textFile("output.txt") val RDDstart = RDDorg.map(line =>…
haroop
  • 13
  • 6