Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
1
vote
0 answers

GraphX ShortestPaths from org.apache.spark.graphx.lib.ShortestPaths

In spark scala, I have a weighted graph of the following form : - Edges: Edge(65281061,65281095,(8915415,229.81473441303393)) - Vertices: (65352257,(0.0,0.0,Map(254396716))) The edges contains the distance (In this case; 229.81473441303393) I…
1
vote
0 answers

spark aggregateMessages tree data sum value of all node

i have a tree data, like this: (A) --> (B) --> (D) \ \--> (C) each node have a value. I want to agg total_value, asume V(i) is value of node i, T(i) is total_value of node i. V(A) = 1 V(B) = 1 V(C) = 1 V(D) = 1 and my desired result…
1
vote
3 answers

Combine associated items in Spark

In Spark, I have a large list (millions) of elements that contain items associated with each other. Examples: 1: ("A", "C", "D") # Each of the items in this array is associated with any other element in the array, so A and C are associated, A, and D…
leontp587
  • 791
  • 2
  • 9
  • 21
1
vote
0 answers

What are the use cases for using Graphframes' connectedComponents various algorithms?

As a background: I am a python coder using Graphframes and pyspark through Databricks. I've been using Graphframes to deduplicate records in the context of record-linkage. Below is some pseudo-code depicting the coding scenario I've come…
1
vote
1 answer

How does Scala represent immutable maps internally from storage standpoint?

I have an application in scala on Spark-graphx. The VD contains a Map[Long, Map[Long, Double]] which needs to grow with each iteration. Both are created from List.toMap, so AFAIK both inner and outer should be immutable. What I have run into on very…
Jennifer
  • 65
  • 1
  • 7
1
vote
0 answers

how to find diamond in graph by Spark graphx

I'm using GraphFrame in Spark GraphX. I tried to find the a diamond in my graph. My graph as following: nodeA->nodeB->nodeD->nodeF nodeA->nodeE->nodeD->nodeG so we can know there is a diamond(quadrilateral) in the graph as…
Jack
  • 5,540
  • 13
  • 65
  • 113
1
vote
0 answers

How to verify the graph.PartitionStrategy has worked or not?

I have used the GraphX API in java and created a graph from the EdgeRDD and the VertexRDD. Initially the RDDs were created using the dataset. If I run the below code I see no error. However, I cannot verify that the code is running and it is…
1
vote
1 answer

How can I create a graph from Text File containing the vertex and edges?

I have created an RDD of two input files i.e. Edges and Node files. While I use the Graph.fromEdge() method to create a graph, I get errors. Could someone please help me? The inputEdgesTextFile and inputNodesTextFile are taking the input text…
1
vote
0 answers

How to create a graph in Apache spark java after loading dataset?

I am new to Apache Spark GraphX and I am trying to create a graph using Java. I have a road network EDGE dataset which consists of Edge_id (INT), Source_ID(INT), Destination_ID (INT), and Edge_Length(Double). I created a class name called…
1
vote
1 answer

How to get a list of the connected components of a graph using GraphX's Java APIs

I'm fairly new to spark and GraphX, and I'm trying to understand how to perform the following operation using GraphX's Java APIs. I'm looking to produce a method with the following signature: private >…
Danimosity
  • 11
  • 2
1
vote
0 answers

Pregel API - why iterations on small graph are consuming so much memory?

I'm relatively new to Spark and Scala however I've decided to post here an example of code that is quite simple and in my perception shouldn't cause a serious problem, however in practice it cause Out of Memory error quite often in AWS EMR Spark…
1
vote
1 answer

How can I submit a Spark Graphx job example on Google Cloud Platform?

I created a cluster on Google Cloud Platform having five linux based virtual machines (VM): one master and 4 workers. I ran ./start-master.sh on the master VM and ./start-worker.sh [external-master-IP:7077] on the worker VMs. Now I want to simply…
1
vote
1 answer

How to Get Connected Component with Graphframes in Pyspark and Raw Data in Spark Dataframe?

I have a spark data frame which looks like below: +--+-----+---------+ |id|phone| address| +--+-----+---------+ | 0| 123| james st| | 1| 177|avenue st| | 2| 123|spring st| | 3| 999|avenue st| | 4| 678| 5th ave| +--+-----+---------+ I am…
MAMS
  • 419
  • 1
  • 6
  • 17
1
vote
1 answer

Grouping people by hobbies

I have been trying to solve this problem but can't really connect it with any solution. I have following data set: [ {"name": "sam", "hobbies": ["Books", "Music", "Gym"]}, {"name": "Steve", "hobbies": ["Books", "Swimming"]}, {"name": "Alex",…
webdev
  • 598
  • 5
  • 16
1
vote
1 answer

Storing Multiple Columns data in Edge and Vertices in Spark

I am new to Spark Graphx and have dataframe for edges as: Dataframe : edges_main +------------------+------------------+------------+--------+-----------+ | src| …
Arshanvit
  • 417
  • 1
  • 7
  • 28