I am trying to calculate standard deviation over cloudant dataframe. I can either use rdd or spark.sql, below is my code snip which is giving me error.
cloudantdata.createOrReplaceTempView("washing")
from math import sqrt
n= spark.sql("SELECT Count(temperature) as tempCount from washing").first().tempCount
meanX = meanTemperature(cloudantdata,spark)
#= spark.sql("SELECT temperature as temp from washing").first().temp
tempx = cloudantdata.filter(lambda x: x[["temperature"]])
ret= tempx.rdd.map(lambda x : pow(x-meanX,2)).sum()
print(ret)
error-
TypeError Traceback (most recent call last)
<ipython-input-61-a97f833d6cc6> in <module>()
4 meanX = meanTemperature(cloudantdata,spark)
5 #= spark.sql("SELECT temperature as temp from washing").first().temp
----> 6 tempx = cloudantdata.filter(lambda x: x[["temperature"]])
7 ret= tempx.rdd.map(lambda x : pow(x-meanX,2)).sum()
8 print(ret)
/usr/local/src/spark21master/spark/python/pyspark/sql/dataframe.py in filter(self, condition)
1033 jdf = self._jdf.filter(condition._jc)
1034 else:
-> 1035 raise TypeError("condition should be string or Column")
1036 return DataFrame(jdf, self.sql_ctx)
1037
TypeError: condition should be string or Column