Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
1
vote
0 answers

GraphFrames Connected Components Performance

When I attempt to generate the connected components using graphframes it is taking substantially longer than I expected. I am running on spark 2.1, graphframes 0.5 and AWS EMR with 3 r4.xlarge instances. When the generating the connected components…
gth685f
  • 585
  • 2
  • 6
  • 13
1
vote
2 answers

Apache toree - pySpark not loading packages

I have Apache Toree installed following the instructions at https://medium.com/@faizanahemad/machine-learning-with-jupyter-using-scala-spark-and-python-the-setup-62d05b0c7f56. However I do not manage to import packages in the pySpark kernel by using…
Roxana
  • 392
  • 1
  • 3
  • 12
1
vote
3 answers

Spark AWS emr checkpoint location

I'm running a spark job on EMR but need to create a checkpoint. I tried using s3 but got this error message 17/02/24 14:34:35 ERROR ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: Wrong FS:…
Philip K. Adetiloye
  • 3,102
  • 4
  • 37
  • 63
1
vote
1 answer

Are GraphFrames compatible with typed Dataset?

We currently use typed Dataset in our work. And we are currently exploring using Graphframes. However, Graphframes seem to be based on Dataframe which is Dataset[Row]. Would Graphframes be compatible with typed Dataset. e.g. Dataset[Person]
samol
  • 18,950
  • 32
  • 88
  • 127
1
vote
1 answer

Using sc.parallelize inside map() or any other solution?

I have following issue: i need to find all combinations of values in the column B per each id from the column A and return the results as DataFrame In example below of the input DataFrame A B 0 5 10 1 1 20…
feechka
  • 205
  • 6
  • 16
1
vote
0 answers

pyspark graphframes to find connected components of a large graph

I was trying to use connectedComponents() from graphframes in pyspark to compute the connected components for a reasonably big graph with roughly 1800K vertices and 500k edges. edgeDF.printSchema() root |-- src: string (nullable = true) |-- dst:…
1
vote
0 answers

Unable to Run GraphFrames with PySpark on Windows

I am trying to run PySpark on Windows with GraphFrames. The GraphFrames QuickStart Guide mentions following - If you have GraphFrames available as a JAR graphframes.jar, you can make GraphFrames available by passing the JAR to the pyspark shell…
1
vote
0 answers

Import Spark GraphFrame package into SparkR

Is there any simple way to include and access GraphFrame in SparkR? I have included the package as follows via cmd line: sparkr --packages graphframes:graphframes:0.2.0-spark2.0-s_2.10, but cannot find documentation on how to use the package in…
kuriouscoder
  • 5,394
  • 7
  • 26
  • 40
1
vote
0 answers

loading vertex and edge as dataFrame for graphFrame

I have two json file. And there is "friend" relation between this two json files. I ant to create vertex and edge using this two json files. After that I will create graphFrame. Here I am using java and spark. But can't understand how can I do that.…
Rhea
  • 381
  • 1
  • 7
  • 22
1
vote
0 answers

how to merge "flow-through" edges in spark graphframes

Would the following graph algorithm be possible to implement with Spark GraphFrames? Given a graph, I'd like to remove nodes that have exactly one incoming edge and one outgoing edge, and merge the two edges into one edge. For instance, assume the…
mortada
  • 1,658
  • 2
  • 14
  • 12
1
vote
1 answer

How to write a transformation function to transform RDD with reference to a Graphframe object?

I have a Graphframe object: g and a RDD object: candidate: g = GraphFrame(v,e) candidates_rdd.collect() # [Row(source=u'a', target=u'b'), # Row(source=u'a', target=u'c'), # Row(source=u'e', target=u'a')] I want to compute a path from "source"…
Yiliang
  • 463
  • 2
  • 6
  • 16
0
votes
3 answers

how to detect a cycle in a Spark Graphframes?

Here is a Spark Graphframes df repsenting a directed graph, there may be some cycles in this graph. How can I detect the cycles in a Graphframe? For example, here is a graph | src | dst | | --- | --- | | 1 | 2 | | 2 | 3 | | 3 | 4 | | 3 …
0
votes
0 answers

Error java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps scala.Predef$.refArrayOps(java.lang.Object[])'

I am trying to fix this error for the last hours. I want to create a GraphFrame in jupyter. In conda I start the jupyter notebook as :pyspark --packages graphframes:graphframes:0.8.2-spark2.4-s_2.11 spark.version '3.4.1' Scala code runner version…
Rochaa
  • 3
  • 3
0
votes
0 answers

compute connectedcomponents using spark and graphframes on a very large number of vertices

I am working with a very large graph of approximately 100 million vertices and I am using graphframes.connectedcomponents with spark to resolve the graph. The output of the solution is a forest like graph. I tried running by bumping up the driver…
sashmi
  • 97
  • 1
  • 2
  • 14
0
votes
1 answer

Use of Graphframes library in palantir-foundry

I want to use GrafFrames package with Pyspark in my Foundry code repository. As mentioned here: https://www.palantir.com/docs/foundry/transforms-python/environment-troubleshooting/#packages-which-require-both-a-conda-package-and-a-jar I included…
Grigory Sharkov
  • 121
  • 1
  • 8