Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
0
votes
0 answers

GraphFrames Pregel doesn't converge

I have a relatively shallow, directed, acyclic graph represented in GraphFrames (a large number of nodes, mainly on disjunct subgraphs). I want to propagate the id of the root nodes (nodes without incoming edges) to all nodes downstream. To achieve…
SDani
  • 79
  • 5
0
votes
0 answers

Graphframes - Distance between vertices in the same connected component

Problem I have a Graphframes graph, from which I've obtained the connected components. Now, I would like to find the distance from a source node to a target node, both pertaining to the same…
itscarlayall
  • 128
  • 1
  • 14
0
votes
0 answers

missing id from vertext dataframe in pyspark creating Graphframe

I have writting this code using Python, when I run it, the following errors show up. spark = SparkSession\ .builder\ .appName("GraphX")\ .getOrCreate() e = spark.read.parquet("hdfs://localhost:9000/gf/edge") v =…
Wria Mohammed
  • 1,433
  • 18
  • 23
0
votes
0 answers

Can't show Graphframe in pyspark

I've installed Pyspark in my computer, and run it with Anaconda prompt. When I launch pyspark in the prompt I get an error when using Show() function Here are my manipulations after launching pyspark: import…
fluflu
  • 5
  • 3
0
votes
0 answers

java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI Error in Databricks

I am getting this error on the Community Edition of Databricks when trying to make a graph with the GraphFrame() function. java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI enter image description here I have tried a few…
0
votes
0 answers

Graphframes static connected component

We are using graphframes library in spark to find the connected components. This solution is working fine for us the only problem is we can't do book keeping on these connected component id's because they can change in future. Is there any way to…
Innovation
  • 1,514
  • 17
  • 32
0
votes
0 answers

Spark stuck on stage when using community detection lpa AND g.edges.count() from graphframes library

I have a graph with 18 million vertices but when I want to count the edges and I use g.edges.count, I get stuck in a stage Also when I use LPA (community detection) algorithm, again I get stuck in a stage, especially when my edges are large. Any…
0
votes
1 answer

How to use GraphFrames on EMR serverless

Summary of steps executed: Uploaded the python script to S3. Created a virtualenv that installs graphframes and uploaded it to S3. Added a VPC to my EMR application. Added graphframes package to spark conf. The error message was: 22/09/11…
0
votes
1 answer

Error when running graphframes in google colab

I am using google colab and I cannot seem to use graphframes. This is what i do: !pip install pyspark Which gives: Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting pyspark Downloading…
0
votes
0 answers

How to implement custom graph clustering algorithm on Spark using GraphFrame?

I have a very large, weighted graph on Azure COSMOS DB. Number of vertices and edges are in billions and size of DB is several TBs. I am trying to cluster the graph on Spark using some custom clustering algorithm. I understood this can be done using…
0
votes
1 answer

Define edge rules in pyspark graphframes

I am using graphframes to represent a graph in pyspark from a similar dataframe: data = [ ("1990", "1995"), ("1980", "1996"), ("1993", "1994"), ("1990", "2002"), ("1996", "2002"), ("1999", "2008"), ("2003",…
shogitai
  • 1,823
  • 1
  • 23
  • 50
0
votes
1 answer

Unique ID (UID) generation using pyspark across different data sources

We are working on a use case to generate a unique ID (UID) for the Customers spanning across different systems/data sources. The unique ID will be generated using PII information such as email & phone no. Problem Statement: For example a Customer…
nilesh1212
  • 1,561
  • 2
  • 26
  • 60
0
votes
1 answer

Unable to run analytics using GraphFrames and PySpark on Jupyter Notebook

I've been trying to install GraphFrames on my environment. I am using Jupyter Notebook and I've successfully installed Spark. In order to install GraphFrames, I did !pip install graphframes directly from my notebook, which ran successfully. Then, I…
cdaveau
  • 129
  • 1
  • 7
0
votes
1 answer

group the related values in one group

trying to group the column values based on related records partColumns = (["partnumber","colVal1","colVal2", "colVal3","colVal4","colVal5"]) partrelations = ([("part0","part1","", "","",""), ("part1","","part2", "","part4",""), …
NNM
  • 358
  • 1
  • 10
0
votes
1 answer

Graphframes and BFS

I'm having some problem to understand BFS on Graphframe. I´m trying to get the "father of all" - the one that has no parent in the graph. See, I have this Dataframe: val df = sqlContext.createDataFrame(List( ("153030152492012801800", ""), …