Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
3
votes
1 answer

GraphX - Best way to store and compute over 3 billion vertices

I am new to Spark and GraphX. So far I have been using Titan DB (HBase storage) and Giraph for processing. I have a requirement to have a graph with ~3 Billion Vertices and ~5 billion Edges. What would be the best way to store the graph(create the…
Ashok Krishnamoorthy
  • 853
  • 2
  • 14
  • 24
3
votes
1 answer

How does GraphX internally traverse the Graph?

I want to know the internal traversal of Graph by GraphX. Is it vertex and edges based traversal or sequential traversal of RDDS? For example given a vertex of graph, i want to fetch only of its neighbors Not the neighbors of all the vertices ? How…
mas
  • 145
  • 8
3
votes
1 answer

reduceByKey processing each flatMap output without aggregating value on key in GraphX

I have a problem running GraphX val adjGraph= adjGraph_CC.vertices .flatMap { case (id, (compID, adjSet)) => (mapMsgGen(id, compID, adjSet)) } // mapMsgGen will generate a list of msgs each msg has the form K->V .reduceByKey((fst,…
2
votes
2 answers

Is Graph available on pyspark for Spark 3.0+

I was wondering if GraphX API is available in PySpark for Spark 3.0+? I'm not finding any of that sort in official documentation. All the examples are developed with Scala. And Where can I get more updates about it. Thanks, Darshan
Darshan Parab
  • 75
  • 2
  • 6
2
votes
1 answer

Convert a JavaRDD> into a Spark Dataset in Java

In Java (not Scala!) Spark 3.0.1 have a JavaRDD instance object neighborIdsRDD which its type is JavaRDD>. Part of my code related to the generation of the JavaRDD is the following: GraphOps graphOps = new…
shogitai
  • 1,823
  • 1
  • 23
  • 50
2
votes
1 answer

How can I load weighted graphs in scala?

It seems that there is no built-in way in graphx to load weighted graphs properly. I have a file with columns representing edges of graph: # source_id target_id weight 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 0 6 How can I load it…
Nourless
  • 729
  • 1
  • 5
  • 18
2
votes
1 answer

Gremlin traversal queries on spark graph

I have build a property graph(60 million nodes, 40 million edges) from s3 using Apache Spark Graphx framework. I want to fire traversal queries on this graph. My queries will be…
AbhiK
  • 247
  • 3
  • 19
2
votes
2 answers

In graphX, how to partition a graph with a custom PartitionStrategy that makes use of its topology?

I want to add a new PartitionStrategy making use of graph topology information. Still, I find the PartitionStrategy only has a function as follows. I can not find any functions that can receive graph data. override def getPartition(src: VertexId,…
DrowFish19
  • 21
  • 2
2
votes
0 answers

GraphX create edges and vertices from csv

I have a csv file with flight information: 10397,ATL,GA,10135,ABE,PA,692,188 10397,ATL,GA,10135,ABE,PA,692,142 10434,AVP,PA,10135,ABE,PA,50,65 ... Columns are as follows:…
ZsoltF
  • 41
  • 6
2
votes
1 answer

How to understand the maxIterations in pregel implement of Apache GraphX

The official explanation is that maxIterations would be used for the non-convergent algorithms. My question is: if I don't know my algorithm's astringency, how should I set the value of maxIterations? And, if there is a convergent algorithm, so that…
2
votes
1 answer

Spark graphX make Edge/Vertex RDD from dataframe

I have 2 large dataframes, edge and vertex, and I know that they need to be in special type Vertex and Edge RDDs, but every tutorial that I have found specifies the Edge and Vertex RDDs as arrays of 3 to 10 items. I need them to directly convert…
Joe S
  • 410
  • 6
  • 16
2
votes
1 answer

How to convert RDD[(String, Iterable[VertexId])] to DataFrame?

I have created an RDD from Graphx which looks like this: val graph = GraphLoader.edgeListFile(spark.sparkContext, fileName) var s: VertexRDD[VertexId] = graph.connectedComponents().vertices val nodeGraph: RDD[(String, Iterable[VertexId])] =…
Aamir
  • 2,380
  • 3
  • 24
  • 54
2
votes
0 answers

Failed to get broadcast_22_piece0 of broadcast_22

when I run Scala application on Spark cluster in yarn mode(spark version 2.2.0),the application is using the pregel model, each vertex in the data graph sends message. the Exception information as follows: Exception in thread "main"…
2
votes
1 answer

spark No space left on device when working on extremely large data

The followings are my scala spark code: val vertex = graph.vertices val edges = graph.edges.map(v=>(v.srcId, v.dstId)).toDF("key","value") var FMvertex = vertex.map(v => (v._1, HLLCounter.encode(v._1))) var encodedVertex = FMvertex.toDF("keyR",…
2
votes
1 answer

How to use combiner in aggregateMessages in GraphX

In GraphX aggregateMessages API class Graph[VD, ED] { def aggregateMessages[Msg: ClassTag]( sendMsg: EdgeContext[VD, ED, Msg] => Unit, mergeMsg: (Msg, Msg) => Msg, tripletFields: TripletFields = TripletFields.All) :…
Litchy
  • 355
  • 1
  • 4
  • 18