1

I am trying to calculate standard deviation over cloudant dataframe. I can either use rdd or spark.sql, below is my code snip which is giving me error.

cloudantdata.createOrReplaceTempView("washing")
from math import sqrt
n= spark.sql("SELECT Count(temperature) as tempCount from        washing").first().tempCount
meanX = meanTemperature(cloudantdata,spark)
#= spark.sql("SELECT temperature as temp from washing").first().temp
tempx = cloudantdata.filter(lambda x: x[["temperature"]])
ret= tempx.rdd.map(lambda x : pow(x-meanX,2)).sum()
print(ret)

error-

TypeError                                 Traceback (most recent call last)
<ipython-input-61-a97f833d6cc6> in <module>()
      4 meanX = meanTemperature(cloudantdata,spark)
      5 #= spark.sql("SELECT temperature as temp from washing").first().temp
 ----> 6 tempx = cloudantdata.filter(lambda x: x[["temperature"]])
      7 ret= tempx.rdd.map(lambda x : pow(x-meanX,2)).sum()
      8 print(ret)

/usr/local/src/spark21master/spark/python/pyspark/sql/dataframe.py in     filter(self, condition)
   1033             jdf = self._jdf.filter(condition._jc)
   1034         else:
 -> 1035             raise TypeError("condition should be string or Column")
   1036         return DataFrame(jdf, self.sql_ctx)
   1037 

TypeError: condition should be string or Column

0 Answers0