Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
0
votes
1 answer

How to run PySpark with installed packages?

Normally, when I run pyspark with graphframes I have to use this command: pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 In the first time run this, this will install the packages graphframes but not the next time. In the .bashrc…
huy
  • 1,648
  • 3
  • 14
  • 40
0
votes
1 answer

ModuleNotFoundError: No module named 'graphframes'

I want to run graphframes with pyspark. I found this answer and follow its instruction but it doesn't work. This is my code hello_spark.py: import pyspark conf = pyspark.SparkConf().set("spark.driver.host", "127.0.0.1") sc =…
huy
  • 1,648
  • 3
  • 14
  • 40
0
votes
1 answer

Reduce and Lambda on pyspark dataframe

Below is an example from https://graphframes.github.io/graphframes/docs/_site/user-guide.html the only thing I confused is the purpose of "lit(0)" from function of condition if this "lit(0)" mean to feed into "cnt"? if yes why is it after…
gllow
  • 63
  • 2
  • 8
0
votes
1 answer

graphframes for pySpark v3.0.1

I'm trying to use the graphframes library with pySpark v3.0.1. (I'm using vscode on debian but trying to import the package from pyspark shell didn't work either) According to the documentation, using $ pyspark --packages…
VectorXY
  • 349
  • 1
  • 3
  • 12
0
votes
1 answer

Update vertices values in GraphFrame

I wonder is there any way to update vertices (or edges) values after constructing a graph with GraphFrame? I have a graph and its vertices have these ['id', 'name', 'age'] columns. I've written a code that creates vertices with new ages and it works…
mirzanahal
  • 167
  • 2
  • 12
0
votes
1 answer

PySpark: remove rows which derivate from others

I do have the following dataframe, which contains all the paths within a tree after going through all nodes. For each jump between nodes, a row will be created where "dist" is the number of nodes so far, "node" the current node and "path" the path…
gijon
  • 1
0
votes
1 answer

how build parent child relationship in pyspark or python?

I have numbers like key,value(1,2),(3,4),(5,6) ,(7,8),(9,10),(2,11),(4,12),(6,13),(8,14),(14,19) my input is (1,2),(3,4),(5,6) ,(7,8),(9,10),(2,11),(4,12),(6,13),(8,14) here i need to create relation 1 --> 2 and 2--> 11 my final output…
0
votes
1 answer

Getting shortestPaths in GraphFrames with Java

I am new to Spark and GraphFrames. When I wanted to learn about shortestPaths method in GraphFrame, GraphFrames documentation gave me a sample code in Scala, but not in Java. In their document, they provided following (Scala code): import…
MNEMO
  • 268
  • 2
  • 11
0
votes
1 answer

Store Graph to disk, Created from Spark GraphFrames

I have around 1Tb of data, I have stored this data in vertices and edge files to be loaded in Spark GraphFrame to create a graph and run motif(pattern finding) queries on this graph. For every batch, this 1Tb of vertices and edge file needs to be…
AbhiK
  • 247
  • 3
  • 19
0
votes
1 answer

Not getting correct label Graphframe LPA

I am using Graphframe LPA to find the communities but somehow it's not giving me expected result graph_data = spark.createDataFrame([ ("a", "d", "friend"), ("b", "d", "friend"), ("c", "d", "friend") ], ["src", "dst", "relationship"]) here my…
0
votes
1 answer

How to visualize graph with network in zeppelin?

I want to visualize my graph with %network in zeppelin. I've defined nodes and edges by reading from JSON file. val nodes = spark.read.option("multiline","true").json("/opt/nodes.json") val edges =…
0
votes
1 answer

Creating a string argument with a function parameter

I am trying to create a function that will allow a user to perform a Breadth-first-search in Graphframes using the .bfs method. An example function looks like: Graphframe.bfs("name = 'Esther'", "relationship = 'friend'") I would like a…
0
votes
0 answers

using GraphFames on java maven project

I'd like to use GraphFrames with java in my maven project. in the official documentation they said that APIs are provided for scala, python and JAVA: https://graphframes.github.io/graphframes/docs/_site/index.html but in real I find only APIs for…
0
votes
1 answer

Pyspark not opening jupyter

I am trying to run graphframes in pyspark (in Ubuntu) and followed the below steps: I edited mu .profile file like below : SPARK_PATH=/home/spark/spark-2.4.4-bin-hadoop2.7 # set PATH so it includes user's private bin…
Ricky
  • 2,662
  • 5
  • 25
  • 57
0
votes
1 answer

Print labels from graphframe in networkx

PYSPARK: how to visualize a GraphFrame? This is a link to a question. I just need to add the labels(both the node names in vertices and relationship,ie, friend or follow in edges). How can I do so?
Harshal
  • 55
  • 7