Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
2
votes
0 answers

Distributed DBSCAN on spark

I'm trying to implement the DBSCAN algorithm on Spark, so I'm following the paper A Parallel DBSCAN Algorithm Based on Spark. They propose an algorithm with 4 main steps: Data partition Computing a local DBSCAN Merging the data partition Global…
fingerprints
  • 2,751
  • 1
  • 25
  • 45
2
votes
1 answer

Breadth First Search algorithm using Apache Spark Graphx

I'm trying to implement BFS(Breadth First Search) algorithm using Apache Spark Graphx. This is my current implementation: object BFSAlgorithm { def run(graph: Graph[VertexId, Int], sourceVertex: VertexId): Graph[Int, Int] = { val bfsGraph:…
2
votes
0 answers

GraphFrame Spark : Get Subgraph from specific node

I'm building an simple Graph with GraphFrames on Scala 2.11 Spark 2.2. I can create my graph without problems, but i have no idea how create a subgraph from input user. I want to extract graph from the big one, like : Get subgraph from node#123…
Gohmz
  • 1,256
  • 16
  • 31
2
votes
1 answer

Number of connected components in a directed graph in GraphX

I know graphx's connectedComponents() method will label each connected component of the graph with the ID of its lowest-numbered vertex. Is there a method call to count the number of connected components in graphx?
ELI
  • 359
  • 1
  • 4
  • 20
2
votes
1 answer

How to generate a GUID id column in Spark that is of integer type

I know I can do UUID.randomUUID.toString to attach an id to each row in my Dataset but I need this id to be a Long since I want to use GraphX. How do I do that in Spark? I know Spark has monotonically_increasing_id() but that is only for the…
pathikrit
  • 32,469
  • 37
  • 142
  • 221
2
votes
0 answers

Spark Graphx in Java

I am preparing graph in Java API and stuck in below conversion as Edge but encoder is giving Edgeclass with Raw parameter Dataset edges = spark.read() .option("header", "true") .option("inferSchema",…
user8708025
2
votes
0 answers

Tools to visualize a Spark GraphX graph with 800 million vertices and 2 billion edges

We have a large graph processed with Spark GraphX having about 800 million nodes and 2 billion vertices. Are there any tool which allows visualization of the data. The data is currently stored in s3 and its capacity is about 600 GB. I checked into…
pjesudhas
  • 399
  • 4
  • 13
2
votes
1 answer

How to compute the sum of degree of two vertexs in each edge in graphx

I have a graph like this: val vertexArray = Array( (1L, ("Alice", 28)), (2L, ("Bob", 27)), (3L, ("Charlie", 65)), (4L, ("David", 42)), (5L, ("Ed", 55))) val edges = sc.parallelize(Array( …
Bin Teng
  • 23
  • 4
2
votes
0 answers

Lookup function in Spark using Scala

This is a Newbie Question. In the Code below val var_A=graphA.edges .filter{case(currEdge)=>currEdge.srcId==currEdge.dstId} .map{case(currEdge)=>(currEdge.srcId,currEdge.attr)} var_A has a type…
ayush gupta
  • 607
  • 1
  • 6
  • 14
2
votes
1 answer

Partitioning Strategy For Complete Graph In Spark GraphX

I have created a graph using Spark graphX in which every vertex is directly connected to every other vertex of graph i.e Complete graph. Please if anyone can suggest good partitioning strategy for this type of situation or any ideas to implement…
mayur
  • 83
  • 4
2
votes
0 answers

map nodeIds to edges GraphX

I have the following code which gives me nodes for GraphX scala> val idNode = cleanwords.flatMap(x=>x).distinct.zipWithIndex.map{case (k, v) => (k, v.toLong)} nodesId: org.apache.spark.rdd.RDD[(String, Long)] = MapPartitionsRDD[185] at map at…
analyticalpicasso
  • 1,993
  • 8
  • 26
  • 45
2
votes
1 answer

Map each element of a list in Spark

I'm working with an RDD which pairs are structured this way: [Int, List[Int]] my goal is to map the items of the list of each pair with the key. So for example I'd need to do this: RDD1:[Int, List[Int]] <1><[2, 3]> <2><[3, 5, 8]> RDD2:[Int,…
Matt
  • 773
  • 2
  • 15
  • 32
2
votes
1 answer

How to create Directional graph with Spark Graphx or Graphframe

I'm trying to run the connected component algorithm on my dataset but on a directional graph. I don't want the connected component to transverse in both direction of the edges. This is my sample code import org.apache.log4j.{Level,…
Philip K. Adetiloye
  • 3,102
  • 4
  • 37
  • 63
2
votes
0 answers

Tree reduction aggregation in Spark Graphx?

I have the below tree structure where 1,2,3,4,5,6 are some ids and in bracket we have the values. ----------- 1(20) | -------…
Anirban
  • 23
  • 4
2
votes
0 answers

Spark Graphx Shortest Path Implementation in Java

I am trying to implement the shortest path functionality of graphx using java. This is possible to do in scala but for java, I have not been able to find detailed documentation/concrete examples on how to do this. It has also been mentioned in…
Shahji1472
  • 21
  • 3