Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
0
votes
1 answer

Spark Graphx : class not found error on EMR cluster

I am trying to process Hierarchical Data using Grapghx Pregel and the code I have works fine on my local. But when I am running on my Amazon EMR cluster it is giving me an error: java.lang.NoClassDefFoundError: Could not initialize class What would…
Monika Patel
  • 35
  • 1
  • 6
0
votes
1 answer

Spark java.lang.NullPointerException when using tuples

I am using the GraphX API for spark to build a graph and process it with Pregel API. The error does not happen if I return an argument tuple from vprog function, but if I return a new tuple using the same tuple, I get null point error. Here is the…
0
votes
1 answer

Scala: GraphX: error: class Array takes type parameters

I am trying to build an Edge RDD for GraphX. I am reading a csv file and converting to DataFrame Then trying to convert to an Edge RDD: val staticDataFrame = spark. read. option("header", true). option("inferSchema", true). …
0
votes
1 answer

Looping an RDD to create Graphs in Spark with Scala

Trying to loop through and RDD and create Graphs using the data on each record. The code is like this: bigjoin has the structure RDD[(String, List[(Long, Long)])] bigjoin.foreach( a => { val imsi = a._1 val pairs = a._2 val…
M.Vela
  • 1
  • 3
0
votes
0 answers

join more than vertexrdd[double] graphx scala

I need to join more than 2 VertexRDD[Double]. I am trying it with the following code, but can't get it to work val vertices=Array((1L, 11.0),(2L, 12.3),(3L,13.8)) val vRDD= sc.parallelize(vertices) val edges =…
0
votes
1 answer

Spark- GraphFrames How to use the component ID in connectedComponents

I'm trying to find all the connected components(in this example, 4 is connected to 100, 2 is connected to 200 etc.) I used val g2 = GraphFrame(v2, e2) val result2 = g2.connectedComponents.run() and that returns nodes with a component ID. My problem…
user4046073
  • 821
  • 4
  • 18
  • 39
0
votes
0 answers

iterate a dataframe to find connected rows-- do I need to convert each row to dataframe?

I have a function below that takes two dataframes and returns a dataframe. def doJoin (df1: DataFrame, df2: DataFrame): DataFrame={ val cols = df1.columns val r = df1.join(df2, cols.map(c => df1(c) === df2(c)).reduce(_ || _) ) …
user4046073
  • 821
  • 4
  • 18
  • 39
0
votes
1 answer

Run lambda per connected component in Spark GraphX

I am trying to execute some lambda per connected component in graphx of Spark. I get connected components using connectedComponents() method, but then I couldn't find any other way except collecting all distinct vertex ids of the graph with labels…
0
votes
0 answers

Vertex with completely different properties in Graphx Spark Scala

I'm struggling with implementing a property graph in Graphx where my vertexes have completely different properties. I am not able to apply inheritance method given in Spark documentation. class VertexProperty() case class UserProperty(val name:…
Nargis
  • 739
  • 7
  • 30
0
votes
1 answer

Spark Graphx inDegrees Sorting - sortBy Vs sortWith

I am trying to sort the vertex list based on in-degrees in a Spark Graph (using Scala) // Sort Ascending - both the 2 below yeild same results gGraph.inDegrees.collect.sortBy(_._2).take(10) gGraph.inDegrees.collect.sortWith(_._2 <…
DanJoe
  • 3
  • 1
0
votes
1 answer

How to print one val to PartitionBy

I have one problem in Apache Spark GraphX, i tried to partition one graph with this method in the main: graph.partitionBy(HDRF, 128) HDRF is a method to do partitioning, I would like to print a val that is inside it, I tried to print but it does…
0
votes
1 answer

Why does sbt update fail with "Conflicting cross-version suffixes" with Spark GraphX?

Here is my sbt for spark with scala on Intellij version := "0.1" scalaVersion := "2.11.11" // https://mvnrepository.com/artifact/org.apache.spark/spark-graphx_2.10 libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "2.1.0" //…
The_Lost_Avatar
  • 992
  • 5
  • 15
  • 35
0
votes
0 answers

Hourly Aggregation at each timestamp in scala spark

I have simple graph having 5 nodes A,B,C,D,E in following way I have time series data for nodes B,C,D,E in following way : DateTime,Value,NodeName 2016-01-01 00:00:00,1.2,B 2016-01-01 00:15:00,1.3,B ------ ------ 2016-12-31…
Utkarsh Saraf
  • 475
  • 8
  • 31
0
votes
1 answer

Anonymous methods passing in arguments

I have created a method inside Pregel which has the following signature: Graph org.apache.spark.graphx.Pregel.apply(Graph arg0, A arg1, int arg2, EdgeDirection arg3, Function3 arg4, Function1,…
Utkarsh Saraf
  • 475
  • 8
  • 31
0
votes
1 answer

error to import graphx library in scala project

When i use the following import statement in my Scala program, I am getting an error.error screen Did i made any mistake with adding the libraryDependencies. For graphx.lib , is there any specific libraryDependencies to add. import…
Akhil T
  • 1
  • 1