Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
2
votes
1 answer

GraphX: Given one VertexID get all connected Vertices

So basically I have a graph and an ID of a specific vertex in a graph in GraphX. Given that VertexID, how do I get all directly connected vertexes to that one vertex? (IE, only one edge away). Thank you
adrian
  • 2,326
  • 2
  • 32
  • 48
2
votes
0 answers

Making clusters from a graph created in GraphX, Spark

I'm using Spark and graphX to make a graph that represents similar images (image names are used as vertices and there's an edge if two pictures have a label in common). As far as I know, graphX partitions data to be stored on separate machines, but…
CMWasiq
  • 79
  • 10
2
votes
0 answers

SPARK: java.lang.NegativeArraySizeException while trying to load Big Graph in GraphX

I'm trying to load a big Graph (60GB) using GraphX in spark1.4.1 in local mode with 16 threads. The driver memory is set to 500GB inside spark-defaults.conf. I work on a machine that has 590341 MB free (shown by free -m command)which is actually…
P. Str
  • 580
  • 1
  • 5
  • 18
2
votes
1 answer

how to convert VertexRDD to DataFrame

I have a VertexRDD[DenseVector[Double]] and I want to convert it to a dataframe. I don't understand how to map the values from the DenseVector to new columns in a data frame. I am trying to specify the schema as: val schemaString = "id prop1 prop2…
user299791
  • 2,021
  • 3
  • 31
  • 57
2
votes
1 answer

I need to do join/joinVertices or add a field in tuple in graph by Spark Graphx

I have a RDF graph(link) with tuples(s,p,o) and I made a property graph from that. My RDF property graph is obtained by following code(Complete code): val propGraph =…
ChikuMiku
  • 509
  • 2
  • 11
  • 22
2
votes
2 answers

Error while creating graph in GraphX using edge/vertex input files

I am getting error on running the below code for graph creation in Spark graphX. I am running it through spark-shell by following command: ./bin/spark-shell -i ex.scala Input: My Vertex File looks like this (each line is a vertex of…
yguw
  • 856
  • 6
  • 12
  • 32
2
votes
2 answers

Spark Invalid Checkpoint Directory

I have a long run iteration in my program and I want to cache and checkpoint every few iterations (this technique is suggested to cut long lineage on the web) so I wont have StackOverflowError, by doing this for (i <- 2 to 100) { //cache and…
Al Jenssen
  • 655
  • 3
  • 9
  • 25
2
votes
0 answers

Processing Apache Spark GraphX multiple subgraphs

I have a parent Graph that I want to filter into multiple subgraphs, so I can apply a function to each subgraph and extract some data. My code looks like this: val myTerms = val myVertices = ... val…
John
  • 1,167
  • 1
  • 16
  • 33
2
votes
1 answer

Find edges from existing vertices in Spark GraphX

Is there any operation on vertices to get my function to find edges base on some preperties?
avivb
  • 187
  • 11
2
votes
1 answer

Is there any Spark GraphX constructor with merge function for duplicate Vertices

I have a graph with many duplicate vertices, but with different attributes(Long). val vertices: RDD[(VertexId, Long)] ... val edges: RDD[Edge[Long]] ... val graph = Graph(vertices, edges, 0L) By default GraphX will merge duplicate…
ponkin
  • 2,363
  • 18
  • 25
2
votes
1 answer

How to create EdgeRDD in Graphx

I am using spark 1.4.0 and graphx and I have my graph edges stored in file and I use the following lines of code to store them in an RDD. I would like to use EdgeRDD instead of RDD[Edge[String]] val edges: RDD[Edge[String]] = edge_file.map(line =>…
SanS
  • 385
  • 8
  • 21
2
votes
1 answer

Find mutually Edges with Spark and GraphX

I'm really new to spark and graphx. My question is that if i have a graph with some nodes that have mutual(reciprocally) edges between them, i want to select the edges with a good performace. An example: Source Dst. 1 2 1 3 1 …
FrankyK
  • 109
  • 1
  • 10
2
votes
0 answers

parameter numIter in stronglyConnectedComponents (GraphX Spark)

I am making myself familiar with GraphX library from Spark using the guide https://spark.apache.org/docs/latest/graphx-programming-guide.html However, reading this (and searching the internet) I couldn't realize what is the input parameter numIter…
mariaza
  • 33
  • 3
2
votes
3 answers

Getting NoSuchMethodError when setting up Spark GraphX graph

I'm getting a similar error to the one encountered here - I can run GraphX using the spark shell, but I'm getting a NoSuchMethodError when I try to use spark-submit on a jar file. This is the line that it complains about: val myGraph: Graph[(String,…
John
  • 1,167
  • 1
  • 16
  • 33
1
vote
1 answer

Convert a Spark DataFrame containing an embedded list into an RDD in Scala

I have a DataFrame in the following format: character title Tony Stark ["Iron Man"] James Buchanan Barnes ["Captain America: The First Avenger","Captain America: The Winter Soldier","Captain America: Civil War","Avengers: Infinity…
tbc
  • 37
  • 1
  • 6