Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
7
votes
1 answer

how to get two-hop neighbors in spark-graphx?

I've created a directed graph, using graphx. #src->dest a -> b 34 a -> c 23 b -> e 10 c -> d 12 d -> c 12 c -> d 11 I want to get all two hop neighbors like this: a -> e 44 a -> d 34 My graph is very large, so I would like to do it…
leslie chu
  • 71
  • 2
7
votes
1 answer

Inspecting GraphX Graph Object

Spark version 1.6.1 Creating Edge and Vertex RDDs val vertices_raw = sqlContext.read.json("vertices.json.gz") val vertices = vertices_raw.rdd.map(row=> ((row.getAs[String]("toid").stripPrefix("osgb").toLong),row.getAs[String]("index"))) val…
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78
7
votes
2 answers

Gremlin - Giraph - GraphX ? On TitanDb

I need some help to be confirm my choice... and to learn if you can give me some information. My storage database is TitanDb with Cassandra. I have a very large graph. My goal is to use Mllib on the graph latter. My first idea : use Titan with…
dede
  • 91
  • 1
  • 7
6
votes
1 answer

Graphx : Is it possible to execute a program on each vertex without receiving a message?

When I was trying to implement an algorithm in Graphx with Scala, I didn't find it possible to activate all the vertices in the next ietration.. How can I send a message to all my graph vertices? In my algorithm, there is some super-steps that…
PhiloJunkie
  • 1,111
  • 4
  • 13
  • 27
6
votes
1 answer

"error: type mismatch" in Spark with same found and required datatypes

I am using spark-shell for running my code. In my code, I have defined a function and I call that function with its parameters. The problem is that I get the below error when I call the function. error: type mismatch; found :…
6
votes
1 answer

Finding connected components of a particular node instead of the whole graph (GraphFrame/GraphX)

I have created a GraphFrame in Spark and the graph currently looks as following: Basically, there will be lot of such subgraphs where each of these subgraphs will be disconnected to each other. Given a particular node ID I want to find all the…
sjishan
  • 3,392
  • 9
  • 29
  • 53
6
votes
1 answer

Many skipped stages for Pregel in Spark UI

I try to run connected components on logNormalGraph. val graph: Graph[Long, Int] = GraphGenerators. logNormalGraph(context.spark, numEParts = 10, numVertices = 1000000, mu = 0.01, sigma = 0.01) val minGraph =…
Alexander Ponomarev
  • 2,598
  • 3
  • 24
  • 31
6
votes
1 answer

How to create a graph from Array[(Any, Any)] using Graph.fromEdgeTuples

I am very new to spark but I want to create a graph from relations that I get from a Hive table. I found a function that is supposed to allow this without defining the vertices but I can't get it to work. I know this isn't a reproducible example but…
Stéphanie C
  • 809
  • 8
  • 31
6
votes
2 answers

Storing a Graph in Spark Graphx with HDFS

I have constructed a graph in Spark's GraphX. This graph is going to have potentially 1 billion nodes and upwards of 10 billion edges, so I don't want to have to build this graph over and over again. I want to have the ability to build it once,…
edenmark
  • 73
  • 1
  • 5
6
votes
2 answers

Does Spark Graphx have visualization like Gephi

I am new to graph world. I have been assigned to work on graph processing. Now I know Apache Spark, so thought of using it Graphx to process large graph. Then I came across Gephi provides nice GUI to manipulate graphs. Does Graphx have such tools or…
Umesh K
  • 13,436
  • 25
  • 87
  • 129
6
votes
0 answers

Spark GraphX memory out of error SparkListenerBus (java.lang.OutOfMemoryError: Java heap space)

I have problem with out of memory on Apache Spark (Graphx). Application run, but after some time shutdown. I use Spark 1.2.0. Cluster has enough memory a number of cores. Other application where I am not using GraphX, run without problem.…
5
votes
0 answers

Graphframes: BFS between two lists of vertices in spark graphframes

My aim is to find whether the max path length between two vertices is <= 4. I have a graph dataframe and a test file of the below format. I am trying to get the output column(OP) from bfs function of graph dataframes. Col1, Col2, OP a1, a4, …
5
votes
1 answer

Shortest path performance in Graphx with Spark

I am creating a graph from a gz compressed json file of edge and vertices type. I have put the files in a dropbox folder here I load and map these json records to create the vertices and edge types required by graphx like this: val vertices_raw =…
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78
5
votes
1 answer

How to process the different graph files to be processed independently in between the cluster nodes in Apache Spark?

​Lets say I have a large number of graph files and each graph has around 500K edges. I have been processing these graph files on Apache Spark and I was wondering how to parallelize the entire graph processing job efficiently. Since for now, every…
hsuk
  • 6,770
  • 13
  • 50
  • 80
5
votes
2 answers

How to create a VertexId in Apache Spark GraphX using a Long data type?

I'm trying to create a Graph using some Google Web Graph data which can be found here: https://snap.stanford.edu/data/web-Google.html import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val textFile =…
Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58
1
2
3
32 33