Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
0
votes
2 answers

can't find module 'graphframes' -- Jupyter

I'm trying to install graphframes package following some instructions I have already read. My first attempt was to do this in the command line: pyspark--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 This works perfectly and the download…
Also
  • 101
  • 1
  • 2
  • 6
0
votes
1 answer

How to find sum/avg of sparkVector element of a DataFrame in Spark/Scala?

I have pageranks result from ParallelPersonalizedPageRank in Graphframes, which is a DataFrame with each element as sparseVector as following: +---------------------------------------+ | pageranks …
Guanghua Shu
  • 95
  • 4
  • 14
0
votes
1 answer

An error occurred while calling o227.run

I'm newer to spark , I tried to create a graphframe and do some query on that this is my code import pyspark from pyspark.sql import SQLContext from graphframe import * sc = pyspark.SparkContext() sqlContext = SQLContext(sc) vertices =…
0
votes
1 answer

Build.sbt breaks when adding GraphFrames build with scala 2.11

I'm trying to add GraphFrames to my scala spark application, and this was going fine when I added the one based on 2.10. However, as soon as I tried to build it with GraphFrames build with scala 2.11, it breaks. The problem would be that there are…
Rink Stiekema
  • 394
  • 3
  • 13
0
votes
1 answer

Package GraphFrames Spark2.0

I have spark 2.0 Scala 2.11.8 and I am trying to include graph frames package. I typed the following in the scala shell: But still I got the error message: scala> import…
user2507238
  • 51
  • 3
  • 8
0
votes
1 answer

Changing columns that are string in Spark GraphFrame

I'm using GraphFrame in spark 2.0 and scala. I need to remove double quote from columns that are in string type (out of many columns). I'm trying to do so using UDF as follow: import org.apache.spark.sql.functions.udf val removeDoubleQuotes = udf(…
MehrdadAP
  • 417
  • 4
  • 11
0
votes
1 answer

How to import to GraphFrame the text time follow structure

I have a file follow structure.Where first column it's nodeID. After ":" it's a node which has a connection with nodeID. Each nodeID can have more than one connection. 0: 5305811, 1: 4798401, 2: 7922543, 3: 7195074, 4: 6399935, 5: 5697217, 6:…
0
votes
1 answer

GraphFrame: missing or invalid dependency detected while loading class file

I am trying to create a graph using spark graphframe here is the code: import org.graphframes._ // Node DataFrames val v = sqlContext.createDataFrame(List( ("a", "Alice", 34), ("b", "Bob", 36), ("c", "Charlie", 30), ("d", "David", 29), …
sjishan
  • 3,392
  • 9
  • 29
  • 53
0
votes
0 answers

Weird JavaNullPointerException while using GraphFrames for connected components

I am currently using GraphFrames to retrieve connected components from a graph. My code is very simple as follows: v = sqlContext.createDataFrame(node,["id","name"]) print v.take(15) e = sqlContext.createDataFrame(edge,["src","dst"]) print…
shu
  • 89
  • 1
  • 8
0
votes
2 answers

Why does "spark-shell --jars" with GraphFrames jar give "error: missing or invalid dependency detected while loading class file 'Logging.class'"?

I have run a command spark-shell --jars /home/krishnamahi/graphframes-0.4.0-spark2.1-s_2.11.jar and it threw me an error error: missing or invalid dependency detected while loading class file 'Logging.class'. Could not access term typesafe in…
0
votes
0 answers

Spark Map with RDD inside

I know Spark will not allow you to use functions that generate RDDs inside of map or any of it's variants. Is there a work around for this? For instance, can I perform a standard looping iterations of all RDDs in a partition. (For instance is there…
Dylan Lawrence
  • 1,503
  • 10
  • 32
0
votes
0 answers

Is there a way to iterate over Spark RDD partitions without using mapping?

I'm currently using graphframes to generate a graph and then I need to find the paths between all vertices. (That is all pairs of vertices are tested to find the minimum path between them.) Both bfs and find in graphframes generate dataframes…
Dylan Lawrence
  • 1,503
  • 10
  • 32
0
votes
1 answer

can't find module 'graphframes'

When I input in command line : pyspark --packages graphframes:graphframes:0.2.0-spark2.0-s_2.11 it will work well. But when I want to use ipython to launch my pyspark and use graphframes package, it doesn't work. When I input in command line…
0
votes
0 answers

Neo4j Spark using graph frames

I recently started using the Neo4j-Spark-Connector and went through some of the examples provided in this link https://neo4j.com/developer/apache-spark/. Everything seemed to work until I tried working with graphframes. When I run the following: val…
Allen Lu
  • 95
  • 2
  • 7
0
votes
1 answer

java.lang.OutOfMemoryError related to Spark Graphframe bfs

The OutOfMemoryError appears after I call bfs 20+ times in this way: list_locals = [] #g is the graphframe with > 3 million nodes and > 15 million edges. def fn(row): arg1 = "id = '%s'" %row.arg1 arg2 = "id = '%s'" %row.arg2 …
Yiliang
  • 463
  • 2
  • 6
  • 16
1 2 3
12
13