Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
0
votes
1 answer

Spark Graphx: Time Cost increases stably per round in linearly style

I use graphx api in a iterative alogrithm. Although I have carefully cache/ unpersist rdd, and take care of the vertices partition num. The time cost still seems increases per round in a lineary trend. The simplified version of my code as following,…
bourneli
  • 2,172
  • 4
  • 24
  • 40
0
votes
0 answers

Edgetriplets are not getting broadcast-ed properly

I created a graph using graphx and now I need to extract sub-graphs from the original graph. In the following code I am trying to broadcast edgetriplets and filter it for each user-id. class VertexProperty(val id:Long) extends Serializable case…
0
votes
1 answer

How to flatten dependency graph?

I am new with Apache Spark, can i get a snippet of how to implement 'flattening' for dependency graph? i.e lets say I have: nodes :A,B,C edges : (A,B),(B,C) it would result with a new Graph: nodes:A,B,C edges:(A,B)(A,C)(B,C)
David H
  • 1,346
  • 3
  • 16
  • 29
0
votes
1 answer

Graph constructed using graphx is not getting broadcast-ed properly

I created a graph using graphx and now I need to extract sub-graphs from the original graph. users_graph is an RDD which has a sub-graph indexed to a user. The problem is that these sub-graphs are not getting computed. I get a…
0
votes
0 answers

Is SparkContext implicitly created when referring val outside class?

I'm using Graphx on Spark for some experiment, and the current step is to get a subgraph of a generated graph. I've checked the the original graph has been generated successfully, not only the lazy lineage goes well but when I try…
Mon.Ma
  • 1
  • 2
0
votes
1 answer

Spark Scala GraphX: Calling Shortest Path within a map function

I'm having an issue in my code where I'm recieving a null pointer exception runtime error when mapping a function that calls shortest path on a global graph variable. For some reason, even though initializing distance in the terminal regularly…
mt88
  • 2,855
  • 8
  • 24
  • 42
0
votes
1 answer

How does the filter operation of Spark work on GraphX edges?

I'm very new to Spark and don't really know the basics, I just jumped into it to solve a problem. The solution for the problem involves making a graph (using GraphX) where edges have a string attribute. A user may wish to query this graph and I…
CMWasiq
  • 79
  • 10
0
votes
1 answer

Spark Scala GraphX: InDegrees not returning indegree for nodes with 0 indegree

I've looked around the internet at examples of the field inDegrees for graphs in GraphX and they've all said that it returns indegrees for every vertex in the graph. However when I do the following example: val a = sc.parallelize(List(Edge(1L, 2L,…
mt88
  • 2,855
  • 8
  • 24
  • 42
0
votes
1 answer

Memory errors in shuffle phase (lost task...) when processing a very big graph with Pregel algorithm

I am executing the Pregel algorithm with Spark GraphX in Scala. My graph contains 1 million of nodes, and 5 millons of edges between them. My cluster is very powerful, with several servers for BigData, with 256GB of memory each. I have a "Java Heap…
Carlos AG
  • 1,078
  • 1
  • 12
  • 16
0
votes
1 answer

Difference between the vertex program and Merge Message part in Pregel API in GraphX

I am new to GraphX and I do not understand the Vertex Program and Merge Message part in Pregel API. Do not do they the same thing ? For example what is the difference between Vertex Program and Merge Message part in the following Pregel code taken…
Morteza Mashayekhi
  • 934
  • 11
  • 23
0
votes
1 answer

For GraphX how do i convert an array of object to an array of Edges

I have an array of object like this edges: Array[Array[(Long, Long, String)]] = Array(Array((-209215114,197853780,Investor), (-209215114,-322475625,Investor), ... and i want to convert it to an array of Edge to pass to a Graph builder. Here is…
Eoin Lane
  • 641
  • 2
  • 6
  • 22
0
votes
1 answer

Getting null attribute in org.apache.spark.graphx.Edge initialization

I am using spark with scala, and what I am doing is parsing a JSON file containing wikidata items, combining it with some extra information and creating a new JSON file. In doing so, I am creating a set of WikidataItem items where each item…
orestis
  • 932
  • 2
  • 9
  • 23
0
votes
1 answer

Querying large Hierarchical

Organisation dealing with HR data (60 GB+ every day). How to query Organisation hierarchical data in efficient manner. Suppose want to query - a) At which level, a person is there in an organisation tree? b) How many direct reportees and indirect…
Bhavuk Chawla
  • 212
  • 1
  • 10
0
votes
1 answer

Spark graphX: how to load big data to create a graph

I see a lot of examples using array to create vertex first then parallelize it to make it a RDD, but if I have huge data then how would I handle it? I don't think I can create an array of say 1 million rows of vertex. There is another post, Spark…
Tara
  • 549
  • 2
  • 7
  • 14
0
votes
2 answers

how to use graphframes inside SPARK on HDInsight cluster

I have setup an SPARK cluster on HDInsight and was am trying to use GraphFrames using this tutorial. I have already used the custom scripts during the cluster creation to enable the GraphX on the spark cluster as described here. When I am running…
Kiran
  • 2,997
  • 6
  • 31
  • 62