Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
1
vote
1 answer

ImportError: cannot import name 'Pregel' from 'graphframes.lib'

I am using pyspark and graphframes from jupyter. I am able to successfully import pyspark and graphframes, but when I try: from graphframes.lib import Pregel I get the following error: ImportError: cannot import name 'Pregel' from…
ChrisDanger
  • 1,071
  • 11
  • 10
1
vote
1 answer

Confused about stop condition on Spark/Graphx/Pregel example program to find 'path distance

' I am working my way through Graphx In Action and this book (source code for which is here: https://github.com/insidedctm/spark-graphx-in-action) discusses two ways of calculating the distance (number of edge hops) between the root of a tree and…
Chris Bedford
  • 2,560
  • 3
  • 28
  • 60
1
vote
1 answer

Expand array Column of PySpark DataFrame

I am having of transferring a DataFrame into a GraphFrame using the data below. Let's consider a column of Authors in a dataframe containing an array of Strings like the one below: +-----------+------------------------------------+ |ArticlePMID| …
Michele La Ferla
  • 6,775
  • 11
  • 53
  • 79
1
vote
2 answers

Finding path between 2 vertices which are not directly connected

I have a connected graph Like this user1|A,C,B user2|A,E,B,A user3|C,B,A,B,E user4|A,C,B,E,B where user are the property name and the path for that particular user is followed. For example for user1 the path is A->C->B user2: A->E->B->A user3:…
DebD
  • 374
  • 1
  • 4
  • 18
1
vote
0 answers

GraphFrames connected components - Component Zero

When I run connected components algorithm on GraphFrames, there is a huge component with the component id of zero - 0. What is that component?
Ron F
  • 370
  • 2
  • 14
1
vote
1 answer

EMR Notebook Scala kernel import graphframes library

Running spark-shell --packages "graphframes:graphframes:0.7.0-spark2.4-s_2.11" in the bash shell works and I can successfully import graphframes 0.7, but when I try to use it in a scala jupyter notebook like this: import…
Joe S
  • 410
  • 6
  • 16
1
vote
1 answer

Does DseGraphFrame in Java support exporting graphs?

Per DSE docs, vertices and edges can be exported calling g.V().hasLabel("Person").write.json("/tmp/person_v_json") in dse spark. Can the same be achieved using DseGraphFrame for the Java SDK? I want to make sure because I can't finda write() method.
Glide
  • 20,235
  • 26
  • 86
  • 135
1
vote
1 answer

Unable to import graphframes in pyspark shell on gcloud dataproc spark cluster

Created a spark cluster through gcloud console with following options gcloud dataproc clusters create cluster-name --region us-east1 --num-masters 1 --num-workers 2 --master-machine-type n1-standard-2 --worker- machine-type n1-standard-1 --metadata…
1
vote
2 answers

How to keep all elements when aggregating on AggregateMessages on a GraphFrame?

Suppose I have the following graph: scala> v.show() +---+---------------+ | id|downstreamEdges| +---+---------------+ |CCC| null| |BBB| null| |QQQ| null| |DDD| null| |FFF| null| |EEE| …
Shafique Jamal
  • 1,550
  • 3
  • 21
  • 45
1
vote
2 answers

How to convert Array[String] to Array[Any] in Spark/Scala

I am trying to generate sourceIds for the parallelPersonalizedPageRank algorithm inside Graphframes and call the algoirthm as following: val PPRIdCS = studentCS.select("id").collect.map(row => row.getString(0)) val ranksCS = studentGraph …
Guanghua Shu
  • 95
  • 4
  • 14
1
vote
1 answer

Error message when i run graphframes in spark pyspark

i have installed GraphFrames package in spark, i have followed the instructions from this link : https://www.datareply.co.uk/blog/2016/9/20/running-graph-analytics-with-spark-graphframes-a-simple-example When i try to execute the following code, i…
1
vote
1 answer

Error while running PageRank and BFS functions on Graphframes in PySpark

I'm new to Spark, and am learning it on the Cloudera Distr for Hadoop (CDH). I'm trying to execute the PageRank and BFS functions through Jupyter Notebook, which was initiated using the following command: pyspark --packages…
1
vote
1 answer

Graphframe error in Scala/Spark

I wrote this code lines in Scala 2.11 into Databricks: import org.graphframes._ val user_ridotto = sqlContext.sql("SELECT * FROM userRidotto") var users_1 = user_ridotto.select("user_id", "name", "city", "num_fr", "fans", "review_count",…
rubik90
  • 67
  • 10
1
vote
1 answer

Eclipse IDE for Scala : symbol is missing from classpath

When I build my Scala-Spark project in Eclipse Oxygen (ubuntu 16.04), it returns me this issue in "Problems" console: Symbol 'term .typesafe.scalalogging' is missing from the classpath. This symbol is required by 'trait…
alukard990
  • 811
  • 2
  • 9
  • 14
1
vote
1 answer

efficiently calculating connected components in pyspark

I'm trying to find the connected components for friends in a city. My data is a list of edges with an attribute of city. City | SRC | DEST Houston Kyle -> Benny Houston Benny -> Charles Houston Charles -> Denny Omaha Carol -> Brian etc. I know the…