Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
3
votes
1 answer

spark one to one shortest path using pregel graphx

i tried to find shortest path from single source to n vertices using code from link val graph: Graph[Long, Double] = GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => e.attr.toDouble) val sourceId: VertexId = 42 val initialGraph…
Sugimiyanto
  • 320
  • 3
  • 18
3
votes
1 answer

graphx graph.apply constructor method - edge partitioning

I have a weighted graph with ~340k vertices and ~772k edges. I build an edge and vertices RDD from a file on HDFS. val verticesRDD : RDD[(VertexId, Long)] val edgesRDD : RDD[Edge[Double]] From these RDDs I create a graph using the .apply…
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78
3
votes
1 answer

Creating Graph in Graphx using pair RDD

I have a pair RDD and want to construct a GraphX Graph using it. I want to have weighted edges i.e. if one edge appears 3 times in the pair RDD I want the edge weight to be 3. take(1) from the RDD looks like this: res2: Array[(String, String)] =…
Saygın Doğu
  • 305
  • 1
  • 4
  • 17
3
votes
1 answer

Spark Pregel is not working with Java

I'm working with GraphX and Pregel with the Java API. I'm trying to implement a MaxValue Algorithm(Given a weighted graph and output is the max weight). But my implementation is not working: public class Main { public static void main(String[]…
Vitali D.
  • 149
  • 2
  • 14
3
votes
1 answer

Does GraphX support different types of vertices in the same graph?

I'd like to know can I model a GraphX graph with different types of vertices? Say I have the following entities: product, buyer, seller. I want to form a graph structure with these entities as vertices. (eg: show graphically a product being sold by…
Mayuri M.
  • 121
  • 1
  • 2
3
votes
1 answer

GraphX - Retrieving all nodes from a path

In GraphX, is there a way to retrieve all the nodes and edges that are on a path that are of a certain length? More specifically, I would like to get all the 10-step paths from A to B. For each path, I would like to get the list of nodes and…
Inbal
  • 281
  • 2
  • 13
3
votes
1 answer

Spark Scala GraphX: Creating a Weighted Directed Graph

I have a dataframe dfMaster which has three columns, vertex1, vertex2, weight. I'm trying to create a GraphX directed weighted graph which has vertexes from V1 and V2 and edges between them with their corresponding weight. I can create the edge and…
mt88
  • 2,855
  • 8
  • 24
  • 42
3
votes
1 answer

How to Parallel Prims Algorithm in Graphx

So I'm trying to write a parallel algorithm for Prims Algorithm but I cant quite figure out how to do it using Spark Graphx. I've looked pretty hard for resources but there aren't a lot of examples of implementing shortest path algorithms in graphx.…
3
votes
1 answer

Creating array per Executor in Spark and combine into RDD

I am moving from MPI based systems to Apache Spark. I need to do the following in Spark. Suppose, I have n vertices. I want to create an edge list from these n vertices. An edge is just a tuple of two integers (u,v), no attributes are…
max
  • 1,692
  • 5
  • 28
  • 40
3
votes
1 answer

Finding cliques or strongly connected components in Apache Spark using Graphx

A clique, C, in an undirected graph G = (V, E) is a subset of the vertices, C ⊆ V, such that every two distinct vertices are adjacent. This is equivalent to the condition that the subgraph of G induced by C is complete. In some cases, the term…
John Lui
  • 1,434
  • 3
  • 23
  • 37
3
votes
1 answer

Spark GraphX Runtime Query

Is it possible to query GraphX at runtime? Or must these queries be compiled and deployed? If so, is there anything out there that would be the equivalent of Cypher for GraphX? Thank you
user2612462
  • 143
  • 1
  • 6
3
votes
3 answers

How to filter a mixed-node graph on neighbor vertex types

This question is about Spark GraphX. I want to compute a subgraph by removing nodes that are neighbors of certain other nodes. Example [Task] Retain A nodes and B nodes that are not neighbors of C2 nodes. Input graph: ┌────┐ …
3
votes
1 answer

Spark GraphX: how to insert just a node to a graph

I know that in GraphX we can merge two graphs in order to update an existing network for example... However, as a usual operation for updating a network is to insert into it a single node. How could one do such an updating operation in GraphX…
Momog
  • 567
  • 7
  • 27
3
votes
1 answer

GraphX does not work with relatively big graphs

I cannot process graph with 230M edges. I cloned apache.spark, built it and then tried it on cluster. I use Spark Standalone Cluster: -5 machines (each has 12 cores/32GB RAM) -'spark.executor.memory' == 25g -'spark.driver.memory' == 3g Graph has…
Hlib
  • 2,944
  • 6
  • 29
  • 33
3
votes
2 answers

How Apache Spark caching works with regard to uncached file sources with non linear DAGs?

Consider the following example val rdd1 = sc.textFile(...) val rdd2 = sc.textFile(...) val a = rdd1.doSomeTransformation val b = rdd1.doAnotherTransformation val c = rdd2.doSomeTransformation val d = rdd2.doAnotherTransformation //nonsense…
Eran Medan
  • 44,555
  • 61
  • 184
  • 276