Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions

votes

1 answer

GraphX - Best way to store and compute over 3 billion vertices

I am new to Spark and GraphX. So far I have been using Titan DB (HBase storage) and Giraph for processing. I have a requirement to have a graph with ~3 Billion Vertices and ~5 billion Edges. What would be the best way to store the graph(create the…

hbase apache-spark spark-graphx

asked Feb 05 '15 at 06:21

Ashok Krishnamoorthy

votes

1 answer

How does GraphX internally traverse the Graph?

I want to know the internal traversal of Graph by GraphX. Is it vertex and edges based traversal or sequential traversal of RDDS? For example given a vertex of graph, i want to fetch only of its neighbors Not the neighbors of all the vertices ? How…

scala apache-spark graph-traversal spark-graphx

asked Jan 14 '15 at 10:57

mas

votes

1 answer

reduceByKey processing each flatMap output without aggregating value on key in GraphX

I have a problem running GraphX val adjGraph= adjGraph_CC.vertices .flatMap { case (id, (compID, adjSet)) => (mapMsgGen(id, compID, adjSet)) } // mapMsgGen will generate a list of msgs each msg has the form K->V .reduceByKey((fst,…

scala mapreduce apache-spark spark-graphx

asked Dec 24 '14 at 22:39

Maher Turifi

votes

2 answers

Is Graph available on pyspark for Spark 3.0+

I was wondering if GraphX API is available in PySpark for Spark 3.0+? I'm not finding any of that sort in official documentation. All the examples are developed with Scala. And Where can I get more updates about it. Thanks, Darshan

apache-spark pyspark spark-graphx

asked Feb 11 '21 at 08:23

Darshan Parab

votes

1 answer

Convert a JavaRDD> into a Spark Dataset in Java

In Java (not Scala!) Spark 3.0.1 have a JavaRDD instance object neighborIdsRDD which its type is JavaRDD>. Part of my code related to the generation of the JavaRDD is the following: GraphOps graphOps = new…

java apache-spark dataset spark-graphx

asked Jan 06 '21 at 15:38

shogitai

1,823
1
23
50

votes

1 answer

How can I load weighted graphs in scala?

It seems that there is no built-in way in graphx to load weighted graphs properly. I have a file with columns representing edges of graph: # source_id target_id weight 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 0 6 How can I load it…

scala apache-spark spark-graphx

asked Dec 23 '20 at 19:29

Nourless

votes

1 answer

Gremlin traversal queries on spark graph

I have build a property graph(60 million nodes, 40 million edges) from s3 using Apache Spark Graphx framework. I want to fire traversal queries on this graph. My queries will be…

apache-spark gremlin graph-databases spark-graphx

asked May 29 '20 at 12:32

AbhiK

votes

2 answers

In graphX, how to partition a graph with a custom PartitionStrategy that makes use of its topology?

I want to add a new PartitionStrategy making use of graph topology information. Still, I find the PartitionStrategy only has a function as follows. I can not find any functions that can receive graph data. override def getPartition(src: VertexId,…

scala apache-spark spark-graphx

asked Nov 19 '19 at 07:37

DrowFish19

votes

0 answers

GraphX create edges and vertices from csv

I have a csv file with flight information: 10397,ATL,GA,10135,ABE,PA,692,188 10397,ATL,GA,10135,ABE,PA,692,142 10434,AVP,PA,10135,ABE,PA,50,65 ... Columns are as follows:…

graph rdd spark-graphx

asked Oct 13 '19 at 13:19

ZsoltF

votes

1 answer

How to understand the maxIterations in pregel implement of Apache GraphX

The official explanation is that maxIterations would be used for the non-convergent algorithms. My question is: if I don't know my algorithm's astringency, how should I set the value of maxIterations? And, if there is a convergent algorithm, so that…

apache-spark iteration spark-graphx

asked May 19 '19 at 15:48

Prometheus Ryan

votes

1 answer

Spark graphX make Edge/Vertex RDD from dataframe

I have 2 large dataframes, edge and vertex, and I know that they need to be in special type Vertex and Edge RDDs, but every tutorial that I have found specifies the Edge and Vertex RDDs as arrays of 3 to 10 items. I need them to directly convert…

scala apache-spark type-conversion spark-graphx

asked Feb 11 '19 at 14:52

Joe S

votes

1 answer

How to convert RDD[(String, Iterable[VertexId])] to DataFrame?

I have created an RDD from Graphx which looks like this: val graph = GraphLoader.edgeListFile(spark.sparkContext, fileName) var s: VertexRDD[VertexId] = graph.connectedComponents().vertices val nodeGraph: RDD[(String, Iterable[VertexId])] =…

scala apache-spark dataframe apache-spark-sql spark-graphx

asked Feb 08 '19 at 09:25

Aamir

2,380
3
24
54

votes

0 answers

Failed to get broadcast_22_piece0 of broadcast_22

when I run Scala application on Spark cluster in yarn mode(spark version 2.2.0),the application is using the pregel model, each vertex in the data graph sends message. the Exception information as follows: Exception in thread "main"…

scala apache-spark apache-spark-sql hadoop-yarn spark-graphx

asked Jun 20 '18 at 09:09

Jessica

votes

1 answer

spark No space left on device when working on extremely large data

The followings are my scala spark code: val vertex = graph.vertices val edges = graph.edges.map(v=>(v.srcId, v.dstId)).toDF("key","value") var FMvertex = vertex.map(v => (v._1, HLLCounter.encode(v._1))) var encodedVertex = FMvertex.toDF("keyR",…

scala apache-spark apache-spark-sql rdd spark-graphx

asked Jun 04 '18 at 02:19

Xiaotian Han

votes

1 answer

How to use combiner in aggregateMessages in GraphX

In GraphX aggregateMessages API class Graph[VD, ED] { def aggregateMessages[Msg: ClassTag]( sendMsg: EdgeContext[VD, ED, Msg] => Unit, mergeMsg: (Msg, Msg) => Msg, tripletFields: TripletFields = TripletFields.All) :…

apache-spark spark-graphx

asked May 16 '18 at 01:52

Litchy

Prev 1 2 3

…

32 33 Next