Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
2
votes
0 answers

How to edit Columns in GraphFrame Aggregate Messages?

I am pretty new to GraphFrames and Scala. I am writing some sort of label propagation algorithm (very different from the library one). Essentially each vertex has an array "memVector" and the edge has a float value "floatWeights". I want to update…
Sai Durgam
  • 21
  • 2
2
votes
0 answers

GraphFrames SLF4J not available

I am running Scala 2.10.4 with Spark 1.5.0-cdh5.5.2 and am getting the following error when running a GraphFrames job: scala > val g = GraphFrame(v, e) error: bad symbolic reference. A signature in Logging.class refers to type LazyLogging in package…
dvreed77
  • 2,217
  • 2
  • 27
  • 42
2
votes
1 answer

Shortcuts for creating complicated Column structures in Spark

I am porting some Graph.pregel algorithms to GraphFrame.aggregateMessages. I'm finding the GraphFrame APIs a little cumbersome. In the Graph APIs, I can send a case class as my message type. But in the GraphFrame APIs, aggregateMessages.sendToSrc…
David Griffin
  • 13,677
  • 5
  • 47
  • 65
2
votes
4 answers

Importing PySpark packages

I have downloaded the graphframes package (from here) and saved it on my local disk. Now, I would like to use it. So, I use the following command: IPYTHON_OPTS="notebook --no-browser" pyspark --num-executors=4 --name gorelikboris_notebook_1 …
Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170
1
vote
1 answer

GraphFrames for pyspark in Azure Synapse

I'm trying to run the basic graphframes python sample on Azure Synapse. The works fine when I upload the correct .jar file from here and write the code in scala. But the same .jar file doesn't get picked up when running the python version of the…
joniba
  • 3,339
  • 4
  • 35
  • 49
1
vote
0 answers

GraphFrames and connected components

I have a graph and that consists of vertices and edges and I am using graphframes library to find connected components of that graph. import GraphFrames as gf connected_components = gf.GraphFrame(vertices,…
1
vote
0 answers

i had the error Py4JJavaError: An error occurred while calling o65.showString in pyspark

i am trying to implement this code using: python 3.9 spark-3.3.1-bin-hadoop3 included pyspark java 1.8.0_171 the paths is alright and i am running other codes on jupyter but i didn't find any answer related to the error Py4JJavaError: An error…
Ahmad Omar
  • 11
  • 3
1
vote
0 answers

PySpark GraphFrame and networkx for graphs with hierarchy

I need to create a graph like this which have two relationships, continent-country, country-city. I have 3 columns: city, country, continent, but not sure how to get it into this graph. Below is an example of another graph with only two columns,…
Kaykay38
  • 21
  • 1
  • 5
1
vote
0 answers

Graphframes connectedComponents is not working if I run my spark jobs via databricks connect

Graphframe connectedComponents is throwing exceptions when i try to run my spark job from databricks connect. Here are the configurations i am using for spark session spark = ( SparkSession .builder .config( "spark.jars.packages", …
shahidammer
  • 1,026
  • 2
  • 10
  • 24
1
vote
0 answers

What are the use cases for using Graphframes' connectedComponents various algorithms?

As a background: I am a python coder using Graphframes and pyspark through Databricks. I've been using Graphframes to deduplicate records in the context of record-linkage. Below is some pseudo-code depicting the coding scenario I've come…
1
vote
1 answer

How to get list of graph nodes after using connectedComponents of pyspark

I am learning PySpark in Python. If I use the below line of code to get components from my graph, then one column would be added to my GraphDataFrame with the component (random number). But I am curious is it possible to get a list of nodes that are…
ffl
  • 91
  • 1
  • 4
1
vote
1 answer

using a modules method in Pyspark map

I have heard that it is available to call a method of another module in python to bring some calculations that is not implemented in spark and of course it is inefficient to do that. I need a method to compute eigenvector centrality of a graph (as…
amin zak
  • 13
  • 3
1
vote
0 answers

how to find diamond in graph by Spark graphx

I'm using GraphFrame in Spark GraphX. I tried to find the a diamond in my graph. My graph as following: nodeA->nodeB->nodeD->nodeF nodeA->nodeE->nodeD->nodeG so we can know there is a diamond(quadrilateral) in the graph as…
Jack
  • 5,540
  • 13
  • 65
  • 113
1
vote
1 answer

How to load GraphFrame/Pyspark DataFrame into Pytorch Geometric (InMemory)Dataset?

Anybody ever done a custom pytorch.data.InMemoryDataset for a spark GraphFrame (or rather Pyspark DataFrames? Looked for people that have done it already but didn't find anything on GitHub/Stackoverflow et cetera and I have little knowledge of…
Ezekiel
  • 91
  • 11
1
vote
1 answer

Py4JJavaError: An error occurred while calling o65.createGraph

I wanted to install graphframes for spark following the instructions on the spark website, but the command: pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 did not work for me. I tried many ways to install, but decided to stay at…
Chonk
  • 13
  • 3