Highest Voted 'graphframes' Questions

1

vote

1 answer

Cannot set checkpoint dir when running Connected Component example

This is the Connected Components example by graphframe: from graphframes.examples import Graphs g = Graphs(sqlContext).friends() # Get example graph result = g.connectedComponents() result.select("id", "component").orderBy("component").show() In…

python pyspark graphframes

asked Apr 18 '21 at 03:56

huy

1,648
3
14
40

1

vote

1 answer

Pyspark + Graphframes: "recursive" message aggregation

I've created the following graph: spark = SparkSession.builder.appName('aggregate').getOrCreate() vertices = spark.createDataFrame([('1', 'foo', 99), ('2', 'bar', 10), ('3', 'baz',…

python apache-spark pyspark apache-spark-sql graphframes

asked Dec 31 '20 at 23:54

Julio

2,261
4
30
56

1

vote

1 answer

Pyspark and Graphframes: Aggregate messages power mean

Given the following graph: Where A has a value of 20, B has a value of 5 and C has a value of 10, I would like to use pyspark/graphframes to compute the power mean. That is, In this case n is the number of items (3 in our case, for three vertices…

python apache-spark pyspark graphframes

asked Dec 31 '20 at 00:43

Julio

2,261
4
30
56

1

vote

0 answers

Iterative GraphFrames AggregateMessages hitting memory limits

I'm using GraphFrame's aggregateMessages capability to build a custom clustering algorithm. I tested this algorithm on a small sample dataset (~100 items) and verified that it works. But when I run this on my real dataset of 50k items, I am getting…

java apache-spark graphframes

asked Dec 30 '20 at 19:36

webber

1,834
5
24
56

1

vote

1 answer

How to Get Connected Component with Graphframes in Pyspark and Raw Data in Spark Dataframe?

I have a spark data frame which looks like below: +--+-----+---------+ |id|phone| address| +--+-----+---------+ | 0| 123| james st| | 1| 177|avenue st| | 2| 123|spring st| | 3| 999|avenue st| | 4| 678| 5th ave| +--+-----+---------+ I am…

python apache-spark pyspark spark-graphx graphframes

asked Dec 28 '20 at 16:31

MAMS

419
1
6
17

1

vote

1 answer

RDD Warning: Not enough space to cache rdd in memory

I am trying to run PageRank algorithm on a graphframe using pyspark. However when I execute it the program keeps running endlessly and I get following warnings: The code is as follows: vertices = sc.createDataFrame(lst_sent,['id',…

apache-spark pyspark pagerank graphframes

asked Jul 29 '20 at 10:15

Jayesh Dubey

33
4

1

vote

1 answer

Convert GraphFrame output to a pandas DataFrame

I checked multiple sources but couldn't pinpoint this particular problem although it probably has a very easy fix. Let's say I have some graph, g. I am able to print the vertices using g.vertices.show() But I'm having a lot of trouble figuring out…

python pandas apache-spark graphframes

asked Jul 09 '20 at 23:07

Jonathan

1,876
2
20
56

1

vote

1 answer

Spark GraphFrames High Shuffle read/write

Hi I have created Graph using vertex and edge files. Size of graph is 600GB. I am querying this graph using motif feature of Spark GraphFrames. I have setup an AWS EMR cluster for querying graph. cluster details:- 1 master and 8 slaves Master Node: …

apache-spark amazon-emr spark-graphx graphframes

asked Jun 21 '20 at 12:47

AbhiK

247
3
19

1

vote

1 answer

Spark graphx issue

I am trying to follow the example in https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html However when changing some criteria the result is not as per expectation. Please see the steps below - from functools…

apache-spark graphframes

asked Jun 17 '20 at 08:56

Pratik Rudra

37
1
7

1

vote

0 answers

Why is there no GraphFrames release for Spark 2.4.x and scala 2.12?

I'm looking at the graphframes releases available here: https://spark-packages.org/package/graphframes/graphframes. The only GraphFrames release available for scala 2.12 as of April 22 2020 is with Spark 3.0, but Spark 3.0 isn't production yet. Is…

apache-spark graphframes

asked Apr 23 '20 at 00:44

Michel Trottier-McDonald

31
1
2

1

vote

1 answer

GraphFrames Shortest Paths gives distance and not the actual path

I'm new to Graphframes and trying to implement edge-betweenness. I tried using shortest Paths function that is built-in. It returns the distance from the source to the destination vertex but not the actual path between them. The output is: | id | …

shortest-path spark-graphx graphframes

asked Apr 22 '20 at 21:28

Shubham Yadav

561
7
16

1

vote

1 answer

Getting Size Exceeded Exception while storing Dataframe into MongoDB

I am trying to store Apache Spark Dataframe into MongoDB using Scala but getting Caused by: org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216. exception while storing dataframe into MongoDB Code…

mongodb scala apache-spark graphframes

asked Mar 05 '20 at 12:08

ameen

41
2
4

1

vote

0 answers

Depth First Search Algorithm in Dataframe(GraphFrame) in spark

I have a two dataframe having one containing vertices val v = sqlContext.createDataFrame(scala.List( ("a", "Alice", 34), ("b", "Bob", 36), ("c", "Charlie", 30), ("d", "David", 29), ("e", "Esther", 1), ("f",…

scala graph apache-spark-sql depth-first-search graphframes

asked Feb 09 '20 at 16:53

Rohan

31
5

1

vote

0 answers

What is the most efficient 'sparky' way to build a graph from raw data?

scala apache-spark apache-spark-sql spark-graphx graphframes

asked Jul 05 '19 at 13:13

iRoygbiv

865
2
7
21

1

vote

1 answer

How to add graphframes to Apache Zeppelin

I am trying to use the graphframes library on Apache Zeppelin with the Spark (pyspark) interpreter, however, I keep on getting the error: ModuleNotFoundError: No module named 'graphframes' whenever I try to import the graphframes module using from…

apache-spark pyspark apache-zeppelin graphframes

asked Jun 01 '19 at 10:47

Marxley

120
1
6

Questions tagged [graphframes]