Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
4
votes
3 answers

Vertex Property Inheritance - Graphx Scala Spark

--- Edit --- My main issue is that I do not understand this paragraph given in the Graphx documentation: In some cases it may be desirable to have vertices with different property types in the same graph. This can be accomplished through…
Akshay Gupta
  • 321
  • 4
  • 14
4
votes
0 answers

modifying spark GraphX pageRank to do random walk with restart

I am trying to implement random walk with restart by modifying the Spark GraphX implementation of PageRank algorithm. def randomWalkWithRestart(graph: Graph[VertexProperty, EdgeProperty], patientID: String , numIter: Int = 10, alpha: Double =…
3
votes
0 answers

"Application attempt...doesn't exist in ApplicationMasterService cache” cause? (Pregel: maxIterations impact on cluster for non-convergent algorithm)

I've tried to run my own Pregel method for a relatively small graph (250k vertices, 1.5M edges). The algorithm which I use may (high chances are) be non-convergent meaning in most cases maxIterations setting is actually acting as hard stop finishing…
3
votes
0 answers

Spark 2.3: How to release RDD from memory in iterative algorithm

Taking an example code from https://livebook.manning.com/book/spark-graphx-in-action/chapter-6/1 import org.apache.spark.graphx._ def dijkstra[VD](g:Graph[VD,Double], origin:VertexId) = { var g2 = g.mapVertices( (vid,vd) => (false, if…
Regalia9363
  • 342
  • 2
  • 14
3
votes
1 answer

GraphX or GraphFrame - community detection in undirected weighted graph

I'm trying to identify strongly connected communities within large group (undirected weighted graph). Alternatively, identifying vertices causing connection of sub-groups (communities) that would be otherwise unrelated. The problem is part of…
Palo
  • 31
  • 2
3
votes
1 answer

How to implement cycle detection with pyspark graphframe pregel API

I am trying to implement the algorithm from Rocha & Thatte (http://cdsid.org.br/sbpo2015/wp-content/uploads/2015/08/142825.pdf) with Pyspark and the pregel wraper from graphframes. Here I am getting stuck with the correct syntax for the message…
Alex Ortner
  • 1,097
  • 8
  • 24
3
votes
1 answer

How to build a graph from a dataframe ? (GraphX)

I'm new to scala and spark and I need to build a graph from a dataframe. this is the structure of my dataframe where S and O are nodes and column P presents edges. +---------------------------+---------------------+----------------------------+ |S …
NTH
  • 101
  • 2
  • 8
3
votes
0 answers

GraphFrames and Label propagation

As I understand from Wikipedia, the label propagation algorithm assigns labels to previously unlabeled nodes in a graph and, at the start of the algorithm, a (generally small) subset of the nodes have labels defined. In the documentation of…
joel314
  • 1,060
  • 1
  • 8
  • 22
3
votes
1 answer

Using graphX in pyspark

Is there Python API for GraphX? I have come across Scala API but I want to know if its possible to use GraphX functionalities in PySpark.
3
votes
4 answers

Remove Vertices with no outgoing edges in GraphX

I have a big Graph (a few million vertices and edges). I want to remove all the vertices (& edges) which has no outgoing edges. I have some code that works but it is slow and I need to do it several times. I am sure I can use some existing GraphX…
Mann
  • 307
  • 2
  • 14
3
votes
1 answer

Distinct on an array in scala returns an empty string

I am trying to learn graphx on from this code click here in GitHub On the spark-shell, when I try this: def parseFlight(str: String): Flight = { val line = str.split(",") Flight(line(0), line(1), line(2), line(3), line(4).toInt, line(5).toLong,…
vikash
  • 33
  • 2
3
votes
1 answer

PageRank using GraphX

I have a .txt file say list.txt which consists of list of source and destination URL in the format google.de/2011/10/Extract-host link.de/2011/10/extact-host facebook.de/2014/11/photos …
ashwini
  • 156
  • 12
3
votes
1 answer

Hierarchical data manipulation in Apache Spark

I am having a Dataset in Spark (v2.1.1) with 3 columns (as shown below) containing hierarchical data. My target objective is to assign incremental numbering to each row based on the parent-child hierarchy. Graphically it can be said that the…
3
votes
2 answers

Weekly Aggregation using Windows Function in Spark

I have data which starts from 1st Jan 2017 to 7th Jan 2017 and it is a week wanted weekly aggregate. I used window function in following manner val df_v_3 = df_v_2.groupBy(window(col("DateTime"), "7 day")) .agg(sum("Value") as…
Utkarsh Saraf
  • 475
  • 8
  • 31
3
votes
0 answers

Scala - Spark: build a Graph (graphX) from vertices and edges dataframe

I have two dataframe with this schema: edges |-- src: string (nullable = true) |-- dst: string (nullable = true) |-- relationship: struct (nullable = false) | |-- business_id: string (nullable = true) | |--…
alukard990
  • 811
  • 2
  • 9
  • 14