Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
0
votes
0 answers

Can we send java objects as part of message in GraphX pregel API?

​I am sending an java complex java object as message in pregel API in graphX and I am getting following error at runtime java.lang.IllegalArgumentException at com.esotericsoftware.reflectasm.shaded.org.objectweb.asm.ClassReader.(Unknown Source) at…
0
votes
1 answer

Transform Array in Spark

val degrees: VertexRDD[Int] = graph.degrees val ngb=graph.collectNeighbors(EdgeDirection.Out) val deg2 = degrees.leftOuterJoin(ngb) Now i want a key/value pair RDD where key is degree and value is neighbor vertex id. Basically i want to change…
user2871856
  • 227
  • 2
  • 3
  • 11
0
votes
1 answer

Combine SpqrkSQL and GraphX

Can you create a stored procedure in SparkSQL and call GraphX API? something like this: registerFunction("storedProcedureGraphX", model.storedProcedureGraphX _) select * from someTable where storedProcedureGraphX(nodeX, nodeY) > 10
0
votes
1 answer

Dataframes to EdgeRDD (GraphX) using Scala api to Spark

Is there a nice way of going from a Spark DataFrame to an EdgeRDD without hardcoding types in the Scala code? The examples I've seen use case classes to define the type of the EdgeRDD. Let's assume that our Spark DataFrame has StructField ("dstID",…
0
votes
1 answer

removing vertices with no edges in graphx spark

I was wondering if somebody could help, I'm having a problem with a function written for graphx in spark which keeps giving error messages if I have vertices with no edges. When joining edges and vertices together val graph = Graph(vertices,…
ALs
  • 509
  • 2
  • 4
  • 17
0
votes
1 answer

Retrieving TriangleCount

I'm trying to retrieve the amount of triangles from a graph using graphX. As I'm new to both Scala and graphX, I'm currently quite stuck. I'm creating a graph from an edgefile: 1 2 1 3 2 3 This should be 1 triangle. Next I'm using the build in…
0
votes
1 answer

Connecting the first two nodes with an edge from two RDDs in GraphX

I am using GraphX for the first time and I want to build a Graph incrementally. So I need to connect the first two nodes to an edge knowing that I have 2 RDDs (each one has a single value): firstRDD: RDD[((Int, Array[Int]), ((VertexId, Array[Int]),…
fadhloun anis
  • 525
  • 1
  • 6
  • 13
0
votes
1 answer

traversing a graph in spark-graphx via edge properties

I was hoping somebody might have some suggestions for the following, I had some really great help on here recently with a similar(ish) problem and wanted to expand on it. I currently have a network built using graphx which looks like the following…
ALs
  • 509
  • 2
  • 4
  • 17
0
votes
1 answer

how to pull neo4j database to mazerunner docker

I am using Mazerunner docker given by kenny Bastani to integrate neo4j and spark-graphx. I am able to process Movie graph that is given. Now I want to pull my own Twitter graph to Mazerunner docker. Can any one tell me how to pull a new graph to…
Naren
  • 457
  • 2
  • 10
  • 19
0
votes
1 answer

How to compute edges between nodes v, w that are pointed to by the same node x

This question is about Spark GraphX. Given an arbitry graph, I want to compute a new graph that adds edges between any two nodes v, w that are both pointed to by some node x. The new edges should contain the pointing node as an attribute. That is,…
0
votes
1 answer

Apache Spark GraphX java.lang.ArrayIndexOutOfBoundsException

I am trying to understand how to work with Spark-GraphX but always have some problems, so maybe somebody could advise me what to read etc. I tried to read Spark documentation and Learning Spark - O'Reilly Media book, but could not find any…
Roman
  • 257
  • 1
  • 2
  • 4
0
votes
1 answer

What exactly is meant by match in Join operators

I'm confused. I am trying to do what seems like a fairly simple join operation but it is not working as I expect. I have two graphs, pGraph and cGraph. Each is built by reading entries from a CSV file and the id values used are generated from one of…
Phasmid
  • 923
  • 7
  • 19
0
votes
0 answers

How to build a correctly working GraphX Spark application to run on EMR?

I have a script written with Spark GraphX (Scala 2.10) and other Spark libraries to process PageRank scores for a Wikipedia dump and retrieve the top results. I am able to get the script to run locally by putting it in the examples folder and…
0
votes
1 answer

Conversion of GUID type String to VertexIDs type Long using Piggybank HashFNV in Pig

I have 2 text files stored in Hadoop that I want to use to create a Graph in Apache Spark GraphX: A text file with Vertex information, including a GUID type String identifying each Vertex. A text file with Edge information, including two GUIDs…
Luc
  • 223
  • 2
  • 13
0
votes
2 answers

Simple path queries on large graphs

I have a question about large graph data. Suppose that we have a large graph with nearly 100 million edges and around 5 million nodes, in this case what is the best graph mining platform that you know of that can give all simple paths of lengths <=k…
mgokhanbakal
  • 1,679
  • 1
  • 20
  • 26