Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
5
votes
0 answers

Apache Spark computing shortest path

I am trying to compute the shortest path in a large network from a given source to a given target based on weights unsing Apache Spark. Since all my other code is written in python I don't wanna change. It should be somehow possible, shoundn't it?…
JustSomeone
  • 171
  • 2
  • 12
5
votes
1 answer

Variable length motif GraphFrames

I am trying to find all paths from node A to node B with pathLength < 10 using GraphFrames. I can do it using the following code, but, was wondering if there is a better way to do this. val graph = GraphFrame(vertices, edges) val motif1 =…
user100001
  • 245
  • 3
  • 8
5
votes
1 answer

PySpark, GraphFrames, exception Caused by: java.lang.ClassNotFoundException: com.typesafe.scalalogging.slf4j.LazyLogging

I am trying to run the following code which leverages graphframes, and I am getting an error now which, to the best of my knowledge and after some hours of Googling, I cannot resolve. It seems like a class cannot be loaded, but I don't really know…
Christos Hadjinikolis
  • 2,099
  • 3
  • 20
  • 46
5
votes
1 answer

How to process the different graph files to be processed independently in between the cluster nodes in Apache Spark?

​Lets say I have a large number of graph files and each graph has around 500K edges. I have been processing these graph files on Apache Spark and I was wondering how to parallelize the entire graph processing job efficiently. Since for now, every…
hsuk
  • 6,770
  • 13
  • 50
  • 80
4
votes
2 answers

How to do this transformation in SQL/Spark/GraphFrames

I've a table containing the following two columns: Device-Id Account-Id d1 a1 d2 a1 d1 a2 d2 a3 d3 a4 d3 a5 d4 a6 d1 a4 Device-Id is the unique Id of the device…
Aman Gill
  • 87
  • 7
4
votes
2 answers

GraphFrames with pySpark

I want to use GraphFrames with PySpark (currently using Spark v2.3.3, on Google Dataproc). After installing GraphFrames with pip install graphframes I try to run the follwing code: from graphframes import * localVertices = [(1,"A"), (2,"B"), (3,…
Alex
  • 1,447
  • 7
  • 23
  • 48
4
votes
2 answers

Spark Graphframes large dataset and memory Issues

I want to run a pagerank on relativly large graph 3.5 billion nodes 90 billion edges. And I have been experimenting with different cluster sizes to get it to run. But first the code: from pyspark.sql import SparkSession import graphframes spark =…
Thagor
  • 820
  • 2
  • 10
  • 33
4
votes
0 answers

spark graph frames aggregate messages multiple iterations

Spark graphFrames documentation has a nice example how to apply aggregate messages function. To me, it seems to only calculate the friends /connections of the single and first vertices and not iterate deeper into the graph as graphXs pregel…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
4
votes
2 answers

Sum the Distance in Apache-Spark dataframes

The Following code gives a dataframe having three values in each column as shown below. import org.graphframes._ import org.apache.spark.sql.DataFrame val v = sqlContext.createDataFrame(List( ("1", "Al"), ("2", "B"), ("3",…
Yasir Arfat
  • 645
  • 1
  • 8
  • 21
4
votes
0 answers

Selection of Edges in GraphFrames

I am applying BFS using the Graph frames in Scala, How can I sum the edges weights of the selected shortest path. I have Following Code: import org.graphframes._ import org.apache.spark.sql.DataFrame val v =…
Yasir Arfat
  • 645
  • 1
  • 8
  • 21
3
votes
2 answers

How to create edge list from spark data frame in Pyspark?

I am using graphframes in pyspark for some graph type of analytics and wondering what would be the best way to create the edge list data frame from a vertices data frame. For example, below is my vertices data frame. I have a list of ids and they…
MAMS
  • 419
  • 1
  • 6
  • 17
3
votes
1 answer

GraphX or GraphFrame - community detection in undirected weighted graph

I'm trying to identify strongly connected communities within large group (undirected weighted graph). Alternatively, identifying vertices causing connection of sub-groups (communities) that would be otherwise unrelated. The problem is part of…
Palo
  • 31
  • 2
3
votes
3 answers

ImportError: No module named 'graphframes' databricks

I am trying to import graphframes in to my databricks notebook from graphframes import * but failed with following error message ImportError: No module named 'graphframes' How can I add/import in to databricks notebook, any help…
kumar
  • 43
  • 4
3
votes
1 answer

How to implement cycle detection with pyspark graphframe pregel API

I am trying to implement the algorithm from Rocha & Thatte (http://cdsid.org.br/sbpo2015/wp-content/uploads/2015/08/142825.pdf) with Pyspark and the pregel wraper from graphframes. Here I am getting stuck with the correct syntax for the message…
Alex Ortner
  • 1,097
  • 8
  • 24
3
votes
3 answers

Python Graphframes: trouble installing dependencies

I'm trying to run a simple Graphframes example. I have both Python 3.6.8 and Python 2.7.15, as well as Apache Maven 3.6.0, Java 1.8.0, Apache Spark 2.4.4 and Scala code runner version 2.11.12. I got this error: An error occurred while calling…
Jessica Chambers
  • 1,246
  • 5
  • 28
  • 56
1
2
3
12 13