Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
2
votes
1 answer

spark example wont compile

Trying to run one of apache sparks example codes (https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/graphx/AggregateMessagesExample.scala) I get the following compile error too many arguments for method…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
2
votes
1 answer

spark graphx multiple edge types

I have started using spark very recently. Currently I am testing on a bipartite graph that has different vertex and edge types. From the research I did in graphx, to have different edges and some with properties I need to subclass the edges. Here…
kpenza
  • 114
  • 6
2
votes
0 answers

Memory leak in GraphX even if checkpoint is called on the graph

I am facing OOM whithin a spark streaming application with GraphX. While trying to isolate and reproduce the issue on a simple application, I was able to identify what appears to be 2 kind of memory leaks. The details of those leaks and how to…
Julien
  • 55
  • 7
2
votes
0 answers

Apache spark Graphx :GraphLoader minEdgePartitions selection

Is it good to change the minEdgePartitions when using the GraphLoader function and if so why ; I would like also to ask what number to choose based on cpu and memory. Much appreciated.
user3224454
  • 194
  • 2
  • 16
2
votes
0 answers

SparkR has no GraphX functionality yet?

For a project we need to use GraphX+R, and the natural way is to use SparkR. I have checked everywhere but it seems that SparkR only has the machine learning libraries developed. There is no reference to GraphX. Does anyone know any R bindings for…
cuneyt
  • 336
  • 5
  • 15
2
votes
2 answers

GraphX - Weighted shortest path implementation - java.lang.NoSuchMethodError

Edit - I discovered that the book was written for scala 1.6 but the remainder is 2.11. I am trying to implement a weighted shortest path algorithm from Michael Malak and Robin East's Spark GraphX in Action book. The part in question is Listing 6.4…
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78
2
votes
1 answer

Spark combine DataFrames and GraphX

Is it possible to combine GraphX and DataFrames? I want for every node in the Graph an own DataFrame. I know that GraphX and DataFrame extends RDD and nested RDDs are not possible and SparkContext is not Serializable. But in Spark 2.0.0 I saw that…
Vitali D.
  • 149
  • 2
  • 14
2
votes
1 answer

community detection on edges with weights on spark (louvain)

I would like to use Spark/graphx implementation of Louvain modularity algorithm. https://github.com/Sotera/spark-distributed-louvain-modularity Is there a way to apply it on a graph with weighted edges? It seems that an input file can contain 2…
Dzmitry Haikov
  • 199
  • 1
  • 2
  • 6
2
votes
1 answer

ArrayIndexOutOfBoundsException when accessing triplets of a Graph

I'm playing a bit with GraphX and got stuck with an Exception I can't explain. My code generates 10 random nodes on a graph (of type Point) and then connects some of them. The logic itself doesn't really matter (and actually doesn't have any…
Zach Moshe
  • 2,782
  • 4
  • 24
  • 40
2
votes
0 answers

How does one use igraph with PySpark?

Can the "regular" python package igraph-python be used with PySpark? I have a use-case where I want the functionality of igraph, but our graph is too big to fit into memory on one machine, so we'd like to use dataframes and PySpark to distribute…
Glenn Strycker
  • 4,816
  • 6
  • 31
  • 51
2
votes
1 answer

Viewing a graph in Spark with GraphX and Zeppelin

I'm currently working on a project using someone else's code. I understand the basic concept of how this code works, but not all of it. To that end, I'm trying to trace a small example through a run. I know I can do this using println but I would…
Dylan Lawrence
  • 1,503
  • 10
  • 32
2
votes
1 answer

How to use Spark graph's function mask?

I want to check out if a new graph(called A) is the sub-graph of other graph(called B). And i write a little demo for test, but failed! I run the demo just on spark-shell, spark version 1.6.1: // Build the GraphB val usersB = sc.parallelize(Array( …
2
votes
1 answer

Shortcuts for creating complicated Column structures in Spark

I am porting some Graph.pregel algorithms to GraphFrame.aggregateMessages. I'm finding the GraphFrame APIs a little cumbersome. In the Graph APIs, I can send a case class as my message type. But in the GraphFrame APIs, aggregateMessages.sendToSrc…
David Griffin
  • 13,677
  • 5
  • 47
  • 65
2
votes
1 answer

How to update the weights efficiently according to adjacency matrix?

I have a very large graph. where there are links between the nodes. Each edge has weight 1 initially. I have to update the weights of edges according to transformed adjacency matrix. Where A is Adjcency Matrix. The new weight in nodes (i,j) will be…
Amnesiac
  • 661
  • 1
  • 10
  • 30
2
votes
1 answer

How to find the indirect nodes connected to a particular node in Spark Graphx

I want to find the indirect nodes that are connected to a particular node. I tried using the connected components class of Graph like below... graph.connectedComponents However, it is giving for all the graph..but i want for a particular node. I…
Devndra
  • 41
  • 4