I'm using pyspark 3.2.1. I'm trying to find missing value count in each of the column of my pyspark data frame. So I used following code
dataColumns=['columns in my data frame']
df.select([count(when(isnan(c), c)).alias(c) for c in dataColumns]).show(truncate=False)
But I got error message
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<ipython-input-56-6c7766e33c77> in <module>()
1 dataColumns=['myDate']
----> 2 df.select([count(when(isnan(c), c)).alias(c) for c in dataColumns]).show(truncate=False)
/usr/local/spark/python/pyspark/sql/dataframe.py in select(self, *cols)
1667 [Row(name='Alice', age=12), Row(name='Bob', age=15)]
1668 """
-> 1669 jdf = self._jdf.select(self._jcols(*cols))
1670 return DataFrame(jdf, self.sql_ctx)
1671
/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
/usr/local/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 # Hide where the exception came from that shows a non-Pythonic
116 # JVM exception message.
--> 117 raise converted from None
118 else:
119 raise
AnalysisException: cannot resolve 'isnan(`myDate`)' due to data type mismatch: argument 1 requires (double or float) type, however, '`myDate`' is of timestamp type.;
'Aggregate [count(CASE WHEN isnan(myDate#1994) THEN myDate END) AS myDate#5831]
Can you please help me to resolve this issue?