Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
0
votes
1 answer

Spark GraphX : Filtering by passing a vertex value in triplet

I am using Spark 2.1.0 on Windows 10. Since I am new to Spark, I am following this tutorial In the tutorial, the author prints all the triplets of the graph using the following code: graph.triplets.sortBy(_.attr, ascending=false).map(triplet…
SoakingHummer
  • 562
  • 1
  • 7
  • 25
0
votes
0 answers

java.lang.StackOverflowError with Spark 2.0.2

I am using Spark 2.0.2 and GraphX 2.0.2 in a Scala project. I am using IntelliJ IDEA 2016.3.5. I have this error: java.lang.StackOverflowError at org.apache.spark.io.LZ4BlockInputStream.read(LZ4BlockInputStream.java:125) at…
DaliMidou
  • 111
  • 1
  • 3
  • 14
0
votes
2 answers

Merge RDD of (key,id) with RDD of (k1,k2)

I have an original RDD with data that looks kind of like: (A,A) (A,B) (B,C) (C,D) These are edges in a graph (represented as vertex names.) I use some code to generate a second RDD with unique ids. (A,0) (B,41) (C,82) (D,123) I want to somehow…
Dylan Lawrence
  • 1,503
  • 10
  • 32
0
votes
0 answers

GraphX Disk size running low

I am currently using Apache Spark with Graphx, I have noticed lately that when I run my application with a lots of data the application is using a large part of my disk, for example before I start the program the disk is around 8 GB and during the…
user3224454
  • 194
  • 2
  • 16
0
votes
1 answer

Spark,Graphx program does not utilize cpu and memory

I have a function that takes the neighbors of a node ,for the neighbors i use broadcast variable and the id of the node itself and it calculates the closeness centrality for that node.I map each node of the graph with the result of that…
user3224454
  • 194
  • 2
  • 16
0
votes
1 answer

Spark JobServer: graphx VertexRDD java.lang.ClassNotFoundException

I am developing a SparkJob on jobserver (v0.6.2 spark 1.6.1) using spark graphx and I am running into the following exception when trying to launch my job on Spark JobServer: { "status": "JOB LOADING FAILED", "result": { "errorClass":…
zaki benz
  • 672
  • 7
  • 21
0
votes
1 answer

About Spark GraphX, how can i using other datatype at vertices

enter image description here as your see, "3L,5L,7L,2L" is a Long Data Type. how can i using other scala datatype. such as String.
0
votes
1 answer

How do I get the size of the largest connected component of a graph in Spark?

I'm building a graph from an RDD of tuples of source and destination nodes, like this: Graph.fromEdgeTuples(rawEdges = edgeList, 1) First off, I did not quite understand what the second parameter is. From the documentation, defaultValue the…
Bob
  • 849
  • 5
  • 14
  • 26
0
votes
1 answer

Apache Spark : Reading file in Standalone cluster mode

I am currently using a graph that i load from a file when i run my Graphx application locally. I'd like to run the application in cluster standalone mode. Do I have to make changes like place the file in each cluster node? Can I leave my application…
user3224454
  • 194
  • 2
  • 16
0
votes
1 answer

Reading graph from file

Looking to run a GraphX example on my Windows machine using Spark-Shell from SparklyR install of Hadoop/Spark. Am able to launch the shell from the install directory here first: start…
eyeOfTheStorm
  • 351
  • 1
  • 5
  • 15
0
votes
1 answer

How to Convert a Collection of a String array to String/Text in spark using scala

Code from Apache Spark GrpahX gives me results: Array[(org.apache.spark.graphx.VertexId, Array[org.apache.spark.graphx.VertexId])] = Array((4,Array(17, 18, 20)), (16,Array(20)), (14,Array()), (6,Array(7)), (8,Array(9, 10)), (12,Array(1)),…
Marcin
  • 3
  • 4
0
votes
1 answer

How do i create a Graph in GraphX with this

I am struggling to understand how i am going to create the following in GraphX in Apache spark. I am given the following: a hdfs file which has loads of data which comes in the form: node: ConnectingNode1, ConnectingNode2.. For example: 123214:…
0
votes
1 answer

Storing graphx vertices on HDFS and loading later

I create an RDD: val verticesRDD: RDD[(VertexId, Long)] = vertices I can inspect it and everything looks ok: verticesRDD.take(3).foreach(println) (4000000031043205,1) (4000000031043206,2) (4000000031043207,3) I save this RDD to HDFS…
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78
0
votes
1 answer

how to compute average degree of neighbors with GraphX

I want to compute the average degree of neighbors for each node in my graph. Say we have a graph like this: val users: RDD[(VertexId, String)] = sc.parallelize(Array((3L, "rxin"), (7L, "jgonzal"), …
user299791
  • 2,021
  • 3
  • 31
  • 57
0
votes
1 answer

How to run GraphX on IPython Notebook?

I'm trying to run GraphX on Ipython notebook. Firstly, I launched Spark/Hadoop clusters and then launched ipython notebook using this tutorial (http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/). But now I have only Python 2…
Alex Ermolaev
  • 311
  • 2
  • 4
  • 17