AttributeError: 'NoneType' object has no attribute 'setCallSite'

Question

In PySpark, I want to calculate the correlation between two dataframe vectors, using the following code (I do not have any problem in importing pyspark or createDataFrame):

from pyspark.ml.linalg import Vectors
from pyspark.ml.stat import Correlation
import pyspark

spark = pyspark.sql.SparkSession.builder.master("local[*]").getOrCreate()

data = [(Vectors.sparse(4, [(0, 1.0), (3, -2.0)]),),
        (Vectors.dense([4.0, 5.0, 0.0, 3.0]),)]
df = spark.createDataFrame(data, ["features"])

r1 = Correlation.corr(df, "features").head()
print("Pearson correlation matrix:\n" + str(r1[0]))

But, I got the AttributeError (AttributeError: 'NoneType' object has no attribute 'setCallSite') as:

AttributeError                            Traceback (most recent call last)
<ipython-input-136-d553c1ade793> in <module>()
      6 df = spark.createDataFrame(data, ["features"])
      7 
----> 8 r1 = Correlation.corr(df, "features").head()
      9 print("Pearson correlation matrix:\n" + str(r1[0]))

/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py in head(self, n)
   1130         """
   1131         if n is None:
-> 1132             rs = self.head(1)
   1133             return rs[0] if rs else None
   1134         return self.take(n)

/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py in head(self, n)
   1132             rs = self.head(1)
   1133             return rs[0] if rs else None
-> 1134         return self.take(n)
   1135 
   1136     @ignore_unicode_prefix

/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py in take(self, num)
    502         [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
    503         """
--> 504         return self.limit(num).collect()
    505 
    506     @since(1.3)

/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py in collect(self)
    463         [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
    464         """
--> 465         with SCCallSiteSync(self._sc) as css:
    466             port = self._jdf.collectToPython()
    467         return list(_load_from_socket(port, BatchedSerializer(PickleSerializer())))

/usr/local/lib/python3.6/dist-packages/pyspark/traceback_utils.py in __enter__(self)
     70     def __enter__(self):
     71         if SCCallSiteSync._spark_stack_depth == 0:
---> 72             self._context._jsc.setCallSite(self._call_site)
     73         SCCallSiteSync._spark_stack_depth += 1
     74 

AttributeError: 'NoneType' object has no attribute 'setCallSite'

Any solution?

Please include more (all) of the traceback; that may make it clearer what the underlying error is. — 9769953, May 30 '18 at 13:35
I might be wrong but don't you need to import pyspark to use spark.createDataFrame — Ontamu, May 30 '18 at 13:39
Take those dots out of your report of the stack trace and present the *whole* stack trace. You've omitted the line that the exception is talking about. — BoarGules, May 30 '18 at 13:51
Does this answer your question? [Why do I get AttributeError: 'NoneType' object has no attribute 'something'?](https://stackoverflow.com/questions/8949252/why-do-i-get-attributeerror-nonetype-object-has-no-attribute-something) — Ulrich Eckhardt, Jan 05 '22 at 18:07

MichaelChirico · Answer 1 · 2020-06-17T02:11:43.213

3

There's an ~~open~~ resolved issue around this:

https://issues.apache.org/jira/browse/SPARK-27335?jql=text%20~%20%22setcallsite%22

[Note: as it's resolved, if you're using a more recent version of Spark than October 2019, please report to Apache Jira if you're still encountering this issue]

The poster suggests forcing to sync your DF's backend with your Spark context:

df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
df._sc = spark._sc

This worked for us, hopefully can work in other cases as well.

edited Jun 17 '20 at 02:11

answered May 13 '19 at 03:45

MichaelChirico

33,841
14
113
198

Found a similar issue with `ml.recommendation.ALS` - this partially resolve the issue for me too. – daoudc Nov 18 '19 at 09:17
I am still getting this on Spark 3.0.1, and it's independent of Correlation.corr. I just commented on that existing ticket with the traceback and a little more info. I'll post back if I come up with more info or another workaround. – szeitlin Apr 09 '21 at 17:56

score 2 · Answer 2 · edited Aug 02 '18 at 10:39

2

I got the same error not only with Correlation.corr(...) dataframe, but with ldaModel.describeTopics() as well.

Most probably it is the SPARK bug.

They forget to initialise DataFrame::_sc._jsc member when created resulting dataframe.

Each dataframe has normally this member initialised with proper JavaObject.

edited Aug 02 '18 at 10:39

v8-E

1,077
2
14
21

answered Aug 02 '18 at 10:15

Dmitry Kogan

21
2

This indeed seems to be an issue. I see the same problem with FPGrowth.. model.freqItemsets.collect() would give the same error. – HakunaMaData Apr 24 '19 at 04:10

score 2 · Answer 3 · answered Jul 28 '20 at 08:20

2

There are several reasons for getting that AttributeError:

You can use sc.stop before initializing one of xContext (where x could be SQL, Hive). For example:
```
sc = SparkContext.getOrCreate(conf = conf)  
sc.stop() 
spark = SQLContext(sc)  
```
Your spark is not synchronized on a cluster.

So, just restart your jupyter notebook kernel or reboot an application (not spark context) and it will work.

answered Jul 28 '20 at 08:20

Artem Seleznev

23
1
3

After restarting, the error indeed didn't occur anymore for me too. – Sander Vanden Hautte May 04 '22 at 06:44

AttributeError: 'NoneType' object has no attribute 'setCallSite'

3 Answers3

Linked