Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions

votes

1 answer

Spark GraphX : Filtering by passing a vertex value in triplet

I am using Spark 2.1.0 on Windows 10. Since I am new to Spark, I am following this tutorial In the tutorial, the author prints all the triplets of the graph using the following code: graph.triplets.sortBy(_.attr, ascending=false).map(triplet…

scala apache-spark spark-graphx

asked Mar 30 '17 at 12:16

SoakingHummer

votes

0 answers

java.lang.StackOverflowError with Spark 2.0.2

I am using Spark 2.0.2 and GraphX 2.0.2 in a Scala project. I am using IntelliJ IDEA 2016.3.5. I have this error: java.lang.StackOverflowError at org.apache.spark.io.LZ4BlockInputStream.read(LZ4BlockInputStream.java:125) at…

scala apache-spark spark-graphx

asked Mar 22 '17 at 16:54

DaliMidou

votes

2 answers

Merge RDD of (key,id) with RDD of (k1,k2)

I have an original RDD with data that looks kind of like: (A,A) (A,B) (B,C) (C,D) These are edges in a graph (represented as vertex names.) I use some code to generate a second RDD with unique ids. (A,0) (B,41) (C,82) (D,123) I want to somehow…

apache-spark spark-graphx

asked Mar 10 '17 at 15:22

Dylan Lawrence

1,503
10
32

votes

0 answers

GraphX Disk size running low

I am currently using Apache Spark with Graphx, I have noticed lately that when I run my application with a lots of data the application is using a large part of my disk, for example before I start the program the disk is around 8 GB and during the…

apache-spark spark-graphx

asked Feb 06 '17 at 19:04

user3224454

votes

1 answer

Spark,Graphx program does not utilize cpu and memory

I have a function that takes the neighbors of a node ,for the neighbors i use broadcast variable and the id of the node itself and it calculates the closeness centrality for that node.I map each node of the graph with the result of that…

apache-spark spark-graphx

asked Jan 30 '17 at 16:06

user3224454

votes

1 answer

Spark JobServer: graphx VertexRDD java.lang.ClassNotFoundException

I am developing a SparkJob on jobserver (v0.6.2 spark 1.6.1) using spark graphx and I am running into the following exception when trying to launch my job on Spark JobServer: { "status": "JOB LOADING FAILED", "result": { "errorClass":…

apache-spark spark-graphx spark-jobserver

asked Jan 18 '17 at 13:22

zaki benz

votes

1 answer

About Spark GraphX, how can i using other datatype at vertices

enter image description here as your see, "3L,5L,7L,2L" is a Long Data Type. how can i using other scala datatype. such as String.

apache-spark neo4j arangodb spark-graphx

asked Jan 17 '17 at 06:25

shaoyongyang

votes

1 answer

How do I get the size of the largest connected component of a graph in Spark?

I'm building a graph from an RDD of tuples of source and destination nodes, like this: Graph.fromEdgeTuples(rawEdges = edgeList, 1) First off, I did not quite understand what the second parameter is. From the documentation, defaultValue the…

scala apache-spark spark-graphx

asked Jan 05 '17 at 18:37

Bob

votes

1 answer

Apache Spark : Reading file in Standalone cluster mode

I am currently using a graph that i load from a file when i run my Graphx application locally. I'd like to run the application in cluster standalone mode. Do I have to make changes like place the file in each cluster node? Can I leave my application…

apache-spark spark-graphx

asked Jan 05 '17 at 15:07

user3224454

votes

1 answer

Reading graph from file

Looking to run a GraphX example on my Windows machine using Spark-Shell from SparklyR install of Hadoop/Spark. Am able to launch the shell from the install directory here first: start…

scala apache-spark spark-graphx sparklyr

asked Jan 02 '17 at 20:55

eyeOfTheStorm

votes

1 answer

How to Convert a Collection of a String array to String/Text in spark using scala

Code from Apache Spark GrpahX gives me results: Array[(org.apache.spark.graphx.VertexId, Array[org.apache.spark.graphx.VertexId])] = Array((4,Array(17, 18, 20)), (16,Array(20)), (14,Array()), (6,Array(7)), (8,Array(9, 10)), (12,Array(1)),…

apache-spark spark-graphx

asked Dec 30 '16 at 13:44

Marcin

votes

1 answer

How do i create a Graph in GraphX with this

I am struggling to understand how i am going to create the following in GraphX in Apache spark. I am given the following: a hdfs file which has loads of data which comes in the form: node: ConnectingNode1, ConnectingNode2.. For example: 123214:…

scala hadoop apache-spark mapreduce spark-graphx

asked Dec 16 '16 at 18:47

Rhys Copperthwaite

votes

1 answer

Storing graphx vertices on HDFS and loading later

I create an RDD: val verticesRDD: RDD[(VertexId, Long)] = vertices I can inspect it and everything looks ok: verticesRDD.take(3).foreach(println) (4000000031043205,1) (4000000031043206,2) (4000000031043207,3) I save this RDD to HDFS…

hadoop apache-spark spark-graphx

asked Dec 05 '16 at 11:10

LearningSlowly

8,641
19
55
78

votes

1 answer

how to compute average degree of neighbors with GraphX

I want to compute the average degree of neighbors for each node in my graph. Say we have a graph like this: val users: RDD[(VertexId, String)] = sc.parallelize(Array((3L, "rxin"), (7L, "jgonzal"), …

apache-spark spark-graphx

asked Nov 23 '16 at 22:47

user299791

2,021
3
31
57

votes

1 answer

How to run GraphX on IPython Notebook?

I'm trying to run GraphX on Ipython notebook. Firstly, I launched Spark/Hadoop clusters and then launched ipython notebook using this tutorial (http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/). But now I have only Python 2…

python hadoop apache-spark spark-graphx

asked Nov 12 '16 at 13:46

Alex Ermolaev

Prev 1 2 3

…

32 33 Next