Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
4
votes
2 answers

How to do this transformation in SQL/Spark/GraphFrames

I've a table containing the following two columns: Device-Id Account-Id d1 a1 d2 a1 d1 a2 d2 a3 d3 a4 d3 a5 d4 a6 d1 a4 Device-Id is the unique Id of the device…
Aman Gill
  • 87
  • 7
4
votes
0 answers

How to use GraphX with Spark Streaming?

I have a Spark Structured Streaming application which receives Kafka messages. For each such message it retrieves initial data from DB and performs calculations. I want to use GraphX (or GraphFrame) to build graph for each message and perform…
Igorock
  • 2,691
  • 6
  • 28
  • 39
4
votes
1 answer

Spark GraphX: add multiple edge weights

I am new to GraphX and have a Spark dataframe with four columns like below: src_ip dst_ip flow_count sum_bytes 8.8.8.8 1.2.3.4 435 1137 ... ... ... ... Basically I want to map both src_ip and…
ELI
  • 359
  • 1
  • 4
  • 20
4
votes
0 answers

Performance of Pregel Spark

I am pretty new to Spark and running it in local mode from eclipse on machine having configuration as Windows 10 with 8GB RAM.I was running pregel algorithm for summation of data at each nodes as per following link Aggregation Summation at each…
4
votes
1 answer

How to check if an edge exist in a Spark Graphx graph

I have a Spark Graphx graph, and I want to check wether an edge exists between two vertices or not. What is the preferred method for doing this in Spark Graphx? More specifically I would like to count all the edges between all vertices in one list…
joakimj
  • 79
  • 1
  • 10
4
votes
0 answers

spark graph frames aggregate messages multiple iterations

Spark graphFrames documentation has a nice example how to apply aggregate messages function. To me, it seems to only calculate the friends /connections of the single and first vertices and not iterate deeper into the graph as graphXs pregel…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
4
votes
1 answer

Spark Scala GraphX: Shortest path between two vertices

I have a directed graph G in Spark GraphX (Scala). I would like to find the number of edges that should be crossed starting from a known vertex v1 to arrive in another vertex v2. In other words, I need the shortest path from the vertex v1 to the…
mt88
  • 2,855
  • 8
  • 24
  • 42
4
votes
1 answer

Spark: GraphX fails to find connected components in graphs with few edges and long paths

I'm new to Spark and GraphX and did some experiments with its algorithm to find connected components. I noticed that the structure of the graph seems to have a strong impact on the performance. It was able to compute graphs with millions of vertices…
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
4
votes
1 answer

Update edge weight in graphx

I'm playing around with graphx. I've built a graph I'm trying to update the weight of a relation, import org.apache.spark.rdd.RDD import org.apache.spark.graphx._ def pageHash(title:String ) = title.toLowerCase.replace("…
tourist
  • 4,165
  • 6
  • 25
  • 47
4
votes
2 answers

what is the difference between Titan and Spark-GraphX and which one is the preferred?

I am looking for the difference between Titan and Spark-GraphX and which one is best to use. I googled it but didn't get article on this Could someone provide pointer on this??
4
votes
1 answer

Problems running Spark GraphX algorithms on generated graphs

I have created a graph in Spark GraphX using the following codes. (See my question and solution) import scala.math.random import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import scala.util.Random import…
max
  • 1,692
  • 5
  • 28
  • 40
4
votes
0 answers

How to execute Pregel Shortest Path making all the vertices as source vertex once on Spark Cluster

We have assignment of finding the shortest path using Pregel API for 3lac vertices. We are supposed to make each vertex as source vertex once and identify the shortest path among all these executions. My code looks like below, def shortestPath(sc:…
Sarala Hegde
  • 121
  • 1
  • 6
4
votes
4 answers

How to create a graph from a CSV file using Graph.fromEdgeTuples in Spark Scala

I'm new to Spark and Scala, and I'm trying to carry out a simple task of creating a graph from data in a text file. From the documentation…
4
votes
1 answer

Finding maximum edge weight in Spark GraphX

Let`s say I have a graph with double values for edge attributes and I want to find the maximum edge weight of my graph. If I do this: val max = sc.accumulator(0.0) //max holds the maximum edge weight g.edges.distinct.collect.foreach{ e => if (e.attr…
Al Jenssen
  • 655
  • 3
  • 9
  • 25
4
votes
3 answers

Subtract an RDD from another RDD doesn't work correctly

I want to subtract an RDD from another RDD. I looked into the documentation and I found that subtract can do that. Actually, when I tested subtract, the final RDD remains the same and the values are not removed! Is there any other function to do…
Ronald Segan
  • 215
  • 2
  • 11
1 2
3
32 33